AbstractsComputer Science

#Autism Versus 299.0: Topic Model Exploration of Multimodal Autism Data

by Joy Carol Ming

Institution: Harvard University
Degree: AB
Year: 2015
Keywords: Computer Science; Health Sciences, General
Record ID: 2057925
Full text PDF: http://nrs.harvard.edu/urn-3:HUL.InstRepos:14398542


Though prevalence and awareness for Autism Spectrum Disorder (ASD) has steadily increased, a true understanding is hard to reach because of the behavior-based nature of the diagnosis and the heterogeneity of its manifestations. Parents and caregivers often informally discuss symptoms and behaviors they observe from their children with autism through online medical forums, contrasting the more traditional and structured text of electronic medical records collected by doctors. We modify an anchor word driven topic model algorithm originally proposed by Arora et al. (2012a) to elicit and compare the medical concept topics, or “themes” from both modes of data: the novel data set of posts from autism-specific online medical forums and electronic medical records. We present methods to extract relevant medical concepts from colloquially written forum posts through the use of choice sections of the consumer health vocabulary and other filtering techniques. In order to account for the sparsity of concept data, we propose and evaluate a more robust approach to selecting anchor words that takes into account variance and inclusivity. This approach that combines concept and anchor words selection seeds the discussion about how unstructured text can influence and expand understanding of the enigmatic disorder, autism, and how these methods can be applied to similar sources of texts to solve other problems.