Untargeted metabolomics and novel data analysis strategies to identify biomarkers of diet and type 2 diabetes

by Lin Shi

Institution: Swedish University of Agricultural Sciences
Year: 2017
Keywords: bioinformatics; data analysis; multivariate analysis; metabolism; metabolic disorders; diabetes; diet; risk assessment; Biomarkers; bioinformatics; healthy Nordic dietary index; multivariate analysis; nested case-control study; risk prediction; type 2 diabetes; untargeted LC-MS metabolomics
Posted: 02/01/2018
Record ID: 2170763
Full text PDF: https://pub.epsilon.slu.se/14740/


Type 2 diabetes (T2D) is a major global health problem and prevention could be improved by identifying individuals at risk at an early stage, followed by preventive strategies, e.g., dietary modifications. Untargeted LC-MS metabolomics offers the possibility to identify predictive biomarkers that may improve risk prediction and dietary biomarkers that may facilitate investigation of diet-T2D relationships. However, untargeted metabolomics generates large-scale data, resulting in demanding data processing and statistical analyses preceding meaningful biological interpretation.The work presented in this thesis sought to develop bioinformatics tools for dealing with large-scale data generated from untargeted LC-MS metabolomics, to apply such tools to identify predictive metabolites of T2D and metabolites related to predefined healthy Nordic dietary indices, and to investigate whether such metabolites are associated with T2D risk in a Swedish population.Two novel R programming based packages were developed: batchCorr, a data-processing strategy to correct for within- and between-batch variability in LC-MS experiments, and MUVR, a statistical framework for multivariate analysis with unbiased variable selection. These tools were applied on untargeted LC-MS metabolomics data obtained from plasma samples from a nested case-control study. Overall, 46 predictive metabolites of T2D were identified. Several metabolites showed good long-term reproducibility among healthy participants, reinforcing their potential as predictive biomarkers, while some changed in the disease-associated direction among cases, reflecting disease progression. In total, 38 metabolites were found to be associated with two predefined healthy Nordic dietary indices. No evidence was found to support association between indices and T2D risk. Instead, metabolites related to unhealthy foods not captured in indices were associated with increased risk. In conclusion, the novel bioinformatics tools developed here can overcome vital data-analytical challenges inherent in large-scale untargeted metabolomics studies. Predictive metabolites have great potential to provide information related to T2D pathophysiology and monitoring of disease progression, though only a limited improvement in disease prediction was achieved when adding them to models based on optimally selected traditional risk factors. Moreover, no evidence was found of an association between healthy Nordic dietary indices and T2D risk. Future studies should investigate how diet/lifestyle risk factors affect pathological pathways of T2D and prevent disease development by integration of multi-omics techniques and traditional methods.