Abstracts Category : Other

Add abstract

Want to add your dissertation abstract to this database? It only takes a minute!

Search abstract

Search for abstracts by subject, author or institution

Share this abstract

dissertation.com
on Facebook

Emulating Language Acquisition with Stochastic Gradient Descent: A New Approach to Modeling Phonotactics

by Frederic Jason Freyer

Institution:	Brandeis University
Year:	2017
Keywords:	linguistics; phonotactics; gradient descent; machine learning; phonology; computational linguistics
Posted:	02/01/2018
Record ID:	2197547
Full text PDF:	http://hdl.handle.net/10192/33888

Abstract

We present a phonotactic learning system that achieves strong performance in modeling gradience in phonotactic judgments by combining a natural class-based approach (following Albright 2009) with a learning algorithm that focuses more strongly than past models on emulating human acquisition.It has long been recognized (e.g. Scholes 1956) that phonotactic restrictions in languages are not binary, but rather represent a full spectrum between complete acceptability and unacceptability. Several experiments have verified that English speakers prefer, for example, /p??/ to /????/ in syllable onsets, though both are legal; and /m??/ to /vm/, though both are illegal.Previous approaches to the computational modeling of phonotactics have been notably successful at learning hard constraints, but less so at learning gradient judgments (Coleman and Pierrehumbert 1997, Hayes and Wilson 2008, Albright 2009).Learning in the present model is done by stochastic gradient descent (Bottou 2010), in which every word to which the model is exposed (representing a word that a learner hears) very slightly nudges upward acceptability values for features extracted from the word.This kind of model represents a much more restricted learning environment than past models have used: the model only has access to one word at a time, and does only basic arithmetic calculations. We show that it is possible to substantially replicate phonotactic acceptability judgments crucially including gradience despite these restrictions, and using only about 1 million words of training data. 1 million words represent only a month or two of infant speech exposure (Hart and Risley 2003), further suggesting that it is conceivable for babies to effectively learn phonotactics at a younger age than has been established in existing literature (Mehler et al. 2009, Jusczyk et al. 1994, among others).Finally, we illustrate how the feature values learned by such a model can be used to compute phonotactic similarity between languages, a useful typological measure.