AbstractsLanguage, Literature & Linguistics

Adjusting Sense Representations for Word Sense Disambiguation and Automatic Pun Interpretation

by Tristan Miller




Institution: Technische Universität Darmstadt
Department:
Year: 2016
Posted: 02/05/2017
Record ID: 2066349
Full text PDF: http://tuprints.ulb.tu-darmstadt.de/5400/


Abstract

Word sense disambiguation (WSD)—the task of determining which meaning a word carries in a particular context—is a core research problem in computational linguistics. Though it has long been recognized that supervised (machine learning–based) approaches to WSD can yield impressive results, they require an amount of manually annotated training data that is often too expensive or impractical to obtain. This is a particular problem for under-resourced languages and domains, and is also a hurdle in well-resourced languages when processing the sort of lexical-semantic anomalies employed for deliberate effect in humour and wordplay. In contrast to supervised systems are knowledge-based techniques, which rely only on pre-existing lexical-semantic resources (LSRs). These techniques are of more general applicability but tend to suffer from lower performance due to the informational gap between the target word's context and the sense descriptions provided by the LSR. This dissertation is concerned with extending the efficacy and applicability of knowledge-based word sense disambiguation. First, we investigate two approaches for bridging the information gap and thereby improving the performance of knowledge-based WSD. In the first approach we supplement the word's context and the LSR's sense descriptions with entries from a distributional thesaurus. The second approach enriches an LSR's sense information by aligning it to other, complementary LSRs. Our next main contribution is to adapt techniques from word sense disambiguation to a novel task: the interpretation of puns. Traditional NLP applications, including WSD, usually treat the source text as carrying a single meaning, and therefore cannot cope with the intentionally ambiguous constructions found in humour and wordplay. We describe how algorithms and evaluation methodologies from traditional word sense disambiguation can be adapted for the 'disambiguation' of puns, or rather for the identification of their double meanings. Finally, we cover the design and construction of technological and linguistic resources aimed at supporting the research and application of word sense disambiguation. Development and comparison of WSD systems has long been hampered by a lack of standardized data formats, language resources, software components, and workflows. To address this issue, we designed and implemented a modular, extensible framework for WSD. It implements, encapsulates, and aggregates reusable, interoperable components using UIMA, an industry-standard information processing architecture. We have also produced two large sense-annotated data sets for under-resourced languages or domains: one of these targets German-language text, and the other English-language puns. Advisors/Committee Members: Gurevych, Iryna (advisor), Mihalcea, Rada (advisor), Balke, Wolf-Tilo (advisor).