Add abstract
Want to add your dissertation abstract to this database? It only takes a minute!
Search abstract
Search for abstracts by subject, author or institution
Want to add your dissertation abstract to this database? It only takes a minute!
Search for abstracts by subject, author or institution
Leveraging Lexical-Semantic Knowledge for Text Classification Tasks
by Lucie Flekova
Institution: | Technische Universitt Darmstadt |
---|---|
Year: | 2017 |
Posted: | 02/01/2018 |
Record ID: | 2153715 |
Full text PDF: | http://tuprints.ulb.tu-darmstadt.de/6765/ |
This dissertation is concerned with the applicability of knowledge, contained in lexical-semantic resources, to text classification tasks. Lexical-semantic resources aim at systematically encoding various types of information about the meaning of words and their relations. Text classification is the task of sorting a set of documents into categories from a predefined set, for example, spam and not spam. With the increasing amount of digitized text, as well as the increased availability of the computing power, the techniques to automate text classification have witnessed a booming interest. The early techniques classified documents using a set of rules, manually defined by experts, e.g. computational linguists. The rise of big data led to the increased popularity of distributional hypothesis - i.e., ``a meaning of word comes from its context'' - and to the criticism of lexical-semantic resources as too academic for real-world NLP applications. For long, it was assumed that the lexical-semantic knowledge will not lead to better classification results, as the meaning of every word can be directly learned from the document itself. In this thesis, we show that this assumption is not valid as a general statement and present several approaches how lexicon-based knowledge will lead to better results. Moreover, we show why these improved results can be expected.One of the first problems in natural language processing is the lexical-semantic ambiguity. In text classification tasks, the ambiguity problem has often been neglected. For example, to classify a topic of a document containing the word 'bank', we dont need to explicitly disambiguate it, if we find the word 'river' or 'finance'. However, such additional word may not be always present. Conveniently, lexical-semantic resources typically enumerate all senses of a word, letting us choose which word sense is the most plausible in our context. What if we use the knowledge-based sense disambiguation methods in addition to the information provided implicitly by the word context in the document? In this thesis, we evaluate the performance of selected resource-based word sense disambiguation algorithms on a range of document classification tasks (Chapter 3). We note that the lexicographic sense distinctions provided by the lexical-semantic resources are not always optimal for every text classification task, and propose an alternative technique for disambiguation of word meaning in its context for sentiment analysis applications.The second problem in text classification, and natural language processing in general, is the one with synonymy. The words used in training documents represent only a tiny fraction of the words in the total possible vocabulary. If we learn individual words, or senses, as features in the classification model, our system will not be able to interpret the paraphrases, where the synonymous meaning is conveyed using different expressions. How much would the classification performance improve if the system could determine that two veryAdvisors/Committee Members: Gurevych, Iryna (advisor), Stein, Benno (advisor), Daelemans, Walter (advisor).
Want to add your dissertation abstract to this database? It only takes a minute!
Search for abstracts by subject, author or institution
Electric Cooperative Managers' Strategies to Enhan...
|
|
Bullied!
Coping with Workplace Bullying
|
|
The Filipina-South Floridian International Interne...
Agency, Culture, and Paradox
|
|
Solution or Stalemate?
Peace Process in Turkey, 2009-2013
|
|
Performance, Managerial Skill, and Factor Exposure...
|
|
The Deritualization of Death
Toward a Practical Theology of Caregiving for the ...
|
|
Emotional Intelligence and Leadership Styles
Exploring the Relationship between Emotional Intel...
|
|
Commodification of Sexual Labor
Contribution of Internet Communities to Prostituti...
|
|
The Census of Warm Debris Disks in the Solar Neigh...
|
|
Risk Factors and Business Models
Understanding the Five Forces of Entrepreneurial R...
|
|