Abstracts Category : Other

Add abstract

Want to add your dissertation abstract to this database? It only takes a minute!

Search abstract

Search for abstracts by subject, author or institution

Share this abstract

Information theoretic and machine learning techniques for emerging genomic data analysis

by Minji Kim

Institution: University of Illinois Urbana-Champaign
Year: 2017
Keywords: Genomic compression; DNA folding
Posted: 02/01/2018
Record ID: 2154734
Full text PDF: http://hdl.handle.net/2142/97339


The completion of the Human Genome Project in 2003 opened a new era for scientists. Through advanced high-throughput sequencing technologies, we now have access to a large amount of genomic data and we can use it to answer key biological questions, such as the factors contributing to the development of cancer. Large data sets and rapidly advancing sequencing technology pose challenges for processing and storing large volumes of genomic data. Moreover, the analysis of datasets may be both computationally and theoretically challenging because statistical methods have not been developed for new emerging data. In this work, I address some of these problems using tools from information theory and machine learning.First, I focus on the data processing and storage aspect of metagenomics, the study of microbial communities in environmental samples and human organs. In particular, I introduce MetaCRAM, the first software suite specialized for metagenomic sequencing data processing and compression, and demonstrate that MetaCRAM compresses data to 2-13 percent of the original file size.Second, I analyze a biological dataset assaying the propensity of a DNA sequence to form a four-stranded structure called "G-quadruplex" (GQ). GQ structures have been proposed to regulate diverse key biological processes including transcription, replication, and translation. I present main factors that lead to GQ formation, and propose highly accurate linear regression and Gaussian process regression models to predict the ability of a DNA sequence to fold into GQ.Third, I study data structures to analyze and store three-dimensional chromatin conformation data generated from high-throughput sequencing technologies. In particular, I examine statistical properties of Hi-C contact maps and propose a few suitable formats to encode pairwise interactions between genome locations.Advisors/Committee Members: Milenkovic, Olgica (Committee Chair), Song, Jun S (Committee Chair), Veeravalli, Venugopal V (committee member), Sinha, Saurabh (committee member), Peng, Jian (committee member).

Add abstract

Want to add your dissertation abstract to this database? It only takes a minute!

Search abstract

Search for abstracts by subject, author or institution

Share this abstract

Featured Books

Book cover thumbnail image
Electric Cooperative Managers' Strategies to Enhan...
by White, Michael Edward
Book cover thumbnail image
Bullied! Coping with Workplace Bullying
by Gattis, Vanessa M.
Book cover thumbnail image
The Filipina-South Floridian International Interne... Agency, Culture, and Paradox
by Haley, Pamela S.
Book cover thumbnail image
Solution or Stalemate? Peace Process in Turkey, 2009-2013
by Yurtbay, Baturay
Book cover thumbnail image
Performance, Managerial Skill, and Factor Exposure...
by Avci, S. Burcu
Book cover thumbnail image
The Deritualization of Death Toward a Practical Theology of Caregiving for the ...
by Gibson, Charles Lynn
Book cover thumbnail image
Emotional Intelligence and Leadership Styles Exploring the Relationship between Emotional Intel...
by Olagundoye, Eniola O.
Book cover thumbnail image
Commodification of Sexual Labor Contribution of Internet Communities to Prostituti...
by Young, Jeffrey R.
Book cover thumbnail image
The Census of Warm Debris Disks in the Solar Neigh...
by Patel, Rahul I.
Book cover thumbnail image
Risk Factors and Business Models Understanding the Five Forces of Entrepreneurial R...
by Miles, D. Anthony