Kernel Regression and Estimation: Learning Theory/Application and Errors-in-Variables Model Analysis

by Peiyuan Wu

Institution: Princeton University
Department: Electrical Engineering
Degree: PhD
Year: 2015
Keywords: cost effectiveness; errors-in-variables; Gauss Markov model; kernel method; minimum mean square error; ridge regression; Electrical engineering
Record ID: 2059147
Full text PDF: http://arks.princeton.edu/ark:/88435/dsp010g354h462


This dissertation contains both application and theoretical topics in the field of kernel regression and estimation. The first part of this dissertation discusses kernel-based learning applications in (1) large-scale active authentication prototype and (2) incomplete data analysis. Traditional kernel-based learning algorithms encounter scalability issues in large-scale datasets. For instance, with N samples, the learning complexity is O(N^2) for support vector machine (SVM) and O(N^3) for kernel ridge regression (KRR) with the default Gaussian RBF kernel. By approximating the RBF kernel with a truncated-RBF (TRBF) kernel, a fast KRR learning algorithm is adopted with O(N) training cost and constant prediction cost. It finds application in large scale active authentication prototype based on free-text keystroke analysis, showing both performance and computational advantages over SVM with RBF kernel. This dissertation also explores the application of kernel approach to incomplete data analysis (KAIDA), where the data to be analyzed is highly incomplete due to controlled or unanticipated causes such as concerns on privacy and security as well as cost/failure/accessibility of data sensors. Two partial cosine (PC) kernels, denoted by SM-PC and DM-PC, are proposed. Simulation shows the potential of KAIDA delivering strong resilience against high data sparsity. The second part of this dissertation discusses theoretical properties of nonlinear regression problem in errors-in-variables (EiV) model. The dissertation examines the impact of input noise on nonlinear regression functions by a spectral decomposition analysis. It turns out that the minimum mean square error (MMSE) due to input noise can be decomposed as contributions from various spectral components. Both numerical and analytical methodologies are proposed to construct the orthogonal basis of interest. Closed-form expressions exist in Gaussian and uniform input models. This dissertation also extends Gauss-Markov theorem to EiV model with stochastic regression coefficients. A weighted regularized least squares estimator is proposed to minimize the mean squared error (MSE) in the estimation of both the regression coefficients and the output. Analytical closed-form expressions are derived for polynomial regression problems with Gaussian-distributed inputs. A notion of least mean squares kernel (LMSK) is also proposed to minimize the MSE in KRR learning model.