AbstractsBusiness Management & Administration

One-vector representations of stochastic signals for pattern recognition

by Hao Tang




Institution: University of Illinois – Urbana-Champaign
Department:
Year: 2011
Keywords: Pattern Recognition
Record ID: 1904725
Full text PDF: http://hdl.handle.net/2142/18595


Abstract

When building a pattern recognition system, we primarily deal with stochastic signals such as speech, image, video, and so forth. Often, a stochastic signal is ideally of a one-vector form so that it appears as a single data point in a possibly high-dimensional representational space, as the majority of pattern recognition algorithms by design handle stochastic signals having a one-vector representation. More importantly, a one-vector representation naturally allows for optimal distance metric learning from the data, which generally accounts for significant performance increases in many pattern recognition tasks. This is motivated and demonstrated by our work on semi-supervised speaker clustering, where a speech utterance is represented by a Gaussian mixture model (GMM) mean supervector formed based on the component means of a GMM that is adapted from a universal background model (UBM) which encodes our prior knowledge of speakers in general. Combined with a novel distance metric learning technique that we propose, namely linear spherical discriminant analysis, which performs discriminant analysis in the cosine space, the GMM mean supervector representation of utterances leads to the state-of-the-art speaker clustering performance. Noting that the main criticism of the GMM mean supervector representation is that it assumes independent and identically distributed feature vectors, which is far from true in practice, we propose a novel one-vector representation of stochastic signals based on adapted ergodic hidden Markov models (HMMs) and a novel one-vector representation of stochastic signals based on adapted left-to-right HMMs. In these one-vector representations, a single vector is constructed based on a transformation of the parameters of an HMM that is adapted from a UBM by various controllable degrees, where the transformation is mathematically derived based on an upper bound approximation of the Kullback-Leibler divergence rate between two adapted HMMs. These one-vector representations possess a set of very attractive properties and are rather generic in nature, so they can be used with various types of stochastic signals (e.g. speech, image, video, etc.) and applied to a broad range of pattern recognition tasks (e.g. classification, regression, etc.). In addition, we propose a general framework for one-vector representations of stochastic signals for pattern recognition, of which the proposed one-vector representations based on adapted ergodic HMMs and adapted left-to-right HMMs respectively are two special cases. The general framework can serve as a unified and principled guide for constructing ``the best'' one-vector representations of stochastic signals of various types and for various pattern recognition tasks. Based on different types of underlying statistical models carefully and cleverly chosen to best fit the nature of the stochastic signals, ``the best'' one-vector representations of the stochastic signals may be constructed by a possibly nonlinear transformation of the parameters of the underlying…