Mixed Effects Modeling and Correlation Structure Selection for High Dimensional Correlated Data

by Peng Wang

Institution: University of Illinois – Urbana-Champaign
Year: 2011
Keywords: Conditional score; Smoothly Clipped Absolute Deviation (SCAD)
Record ID: 1914800
Full text PDF: http://hdl.handle.net/2142/26016


Longitudinal data arise frequently in many studies where measurements are obtained from a subject repeatedly over time. Consequently, measurements within a subject are correlated. We address two rather important but challenging issues in this thesis: mixed-effect modeling with unspecified random effects and correlation structure selection for high-dimensional data. In longitudinal studies, mixed-effects models are important for addressing subject-specific effects. However, most existing approaches assume normal distributions for the random effects, which could affect the bias and efficiency of the fixed-effects estimators. Even in the cases where the estimation of the fixed effects is robust against a misspecified distribution of the random effects, the inference based on the random effects could be invalid. We propose a new approach to estimate fixed and random effects using conditional quadratic inference functions. The new approach does not require any specification of the likelihood functions. It can also accommodate serial correlation between observations within the same cluster, in addition to mixed-effects modeling. Other advantages include not requiring the estimation of the unknown variance components associated with the random effects, or the nuisance parameters associated with the working correlations. Real data examples and simulations are used to compare the new approach with the penalized quasi-likelihood approach, {and SAS the GLIMMIX and nonlinear mixed effects model (NLMIXED) procedures.} Model selection of correlation structure for non-normal correlated data is very challenging when the cluster size increases with the sample size, because of the high dimensional correlation parameters involved and %due to lack of the likelihood function for non-normal correlated data. % and when the cluster size diverges as the sample size increases. However, identifying the correct correlation structure can improve estimation efficiency and the power of tests for correlated data. We propose to approximate the inverse of the empirical correlation matrix using a linear combination of candidate basis matrices, and select the correlation structure by identifying non-zero coefficients of the basis matrices. This is carried out by minimizing penalized estimating functions, which balances the complexity and informativeness of modeling for the correlation matrix. The new approach does not require estimating each entry of the correlation matrix, nor the specification of the likelihood function, and can effectively handle non-normal correlated data. Asymptotic theory on model selection consistency and oracle properties are established in the framework of diverging cluster size of correlated data, where the derivation of the asymptotic results is challenging. Our numerical studies indicate that even when the cluster size is very large, the correlation structure can be identified effectively for both normal responses and binary responses.