A framework for event classification in Tweets based on hybrid semantic enrichment

by Simone Aparecida Romero

Institution: Universidade do Rio Grande do Sul
Year: 2017
Keywords: Web semntica; Semantic web; DBPedia; Redes sociais; Recuperacao : Informacao; LOD; Twitter; Event classification
Posted: 02/01/2018
Record ID: 2154801
Full text PDF: http://hdl.handle.net/10183/156642


Social Media platforms have become key as a means of spreading information, opinions or awareness about real-world events. Twitter stands out due to the huge volume of messages about all sorts of topics posted every day. Such messages are an important source of useful information about events, presenting many useful applications (e.g. the detection of breaking news, real-time awareness, updates about events). However, text classification on Twitter is by no means a trivial task that can be handled by conventional Natural Language Processing techniques. In addition, there is no consensus about the definition of which kind of tasks are executed in the Event Identification and Classification in tweets, since existing approaches often focus on specific types of events, based on specific assumptions, which makes it difficult to reproduce and compare these approaches in events of distinct natures. In this work, we aim at building a unifying framework that is suitable for the classification of events of distinct natures. The framework has as key elements: a) external enrichment using related web pages for extending the conceptual features contained within the tweets; b) semantic enrichment using the Linked Open Data cloud to add related semantic features; and c) a pruning technique that selects the semantic features with discriminative potential We evaluated our proposed framework using a broad experimental setting, that includes: a) seven target events of different natures; b) different combinations of the conceptual features proposed (i.e. entities, vocabulary and their combination); c) distinct feature extraction strategies (i.e. from tweet text and web related documents); d) different methods for selecting the discriminative semantic features (i.e. pruning, feature selection, and their combination); and e) two classification algorithms. We also compared the proposed framework against another kind of contextual enrichment based on word embeddings. The results showed the advantages of using the proposed framework, and that our solution is a feasible and generalizable method to support the classification of distinct event types. As plataformas de Mdias Sociais se tornaram um meio essencial para a disponibilizao de informaes. Dentre elas, o Twitter tem se destacado, devido ao grande volume de mensagens que so compartilhadas todos os dias, principalmente mencionando eventos ao redor do mundo. Tais mensagens so uma importante fonte de informao e podem ser utilizadas em diversas aplicaes. Contudo, a classificao de texto em tweets uma tarefa no trivial. Alm disso, no h um consenso quanto quais tarefas devem ser executadas para Identificao e Classificao de Eventos em tweets, uma vez que as abordagens existentes trabalham com tipos especficos de eventos e determinadas suposies, que dificultam a reproduo e a comparao dessas abordagens em eventos de natureza distinta. Neste trabalho, ns elaboramos um framework para a classificao de eventos de natureza distinta. O framework possui os seguintesAdvisors/Committee Members: Becker, Karin.