AbstractsComputer Science

Learning in Relational Networks

by Tanwistha Saha




Institution: George Mason University
Department:
Year: 2014
Keywords: Computer science; Active Learning; Collective Classification; Multi-label Learning; Relational Networks; Sampling; Tag-based Recommender Systems
Record ID: 2037524
Full text PDF: http://hdl.handle.net/1920/9192


Abstract

Classification of nodes in relational networks is an important task because it involves applications in multiple areas that can impact people's lives on a daily basis. The inability to use traditional classification algorithms for classifying nodes in relational networks has encouraged researchers to develop a special class of methods, known as collective classification algorithms. During the training phase, collective classification exploits the structural information embedded in the network for jointly classifying the labels of all test nodes. Any relational model needs good samples for training in order to do better predictions on unseen test data. Hence, to do a fair evaluation of a model we should always make sure that the samples on which the model is trained, are good representation of the original dataset. However, unlike traditional machine learning on non-relational data where randomly selected samples are considered good enough for training a model, relational learning relies heavily on the method of sample selection. This is because, in relational learning information propagates from the training samples to the test samples through the link structure. Hence, a sampling method that is specifically tailored for evaluating collective classification algorithms is required. A remotely related concept to sampling is the process of acquiring informative labeled data for training. Labeled data comes with a cost because it involves human interaction. In order to minimize this cost, numerous active learning algorithms have been proposed by researchers. Although active learning methods have evolved over the years, not much had been done to deal with relational networks which are very common representation of many real-world datasets.