AbstractsBiology & Animal Science

Aligning of short sequences to whole genomes and a program to verify PCR primers

by Arne Olaf Godtland




Institution: University of Oslo
Department:
Year: 1000
Keywords: VDP::420
Record ID: 1290742
Full text PDF: https://www.duo.uio.no/handle/10852/9657


Abstract

Today, there are several algorithms and tools available for aligning nucleotide sequences locally. In this master thesis is especially MegaBlast, bu also other tools in the BLAST family, used in the construction of two different applications. What these applications have in common, is that they aim to find all perfect matches when aligning short nucleotide sequences to a large nucleotide database. Both accuracy and time efficiency is emphasized in the applications. UniquePrimers, one of the applications, is constructed in the context of verifying potential primers for a polymerase chain reactions. The application uses the primers as input, and returns all sequences that include the primers within a specified distance between the primers. This means that the application verifies whether or not a pair of primers are unique for only one specific database sequence. A search takes approximately from 15 seconds up to 5 minutes to execute. By using MegaBlast and word length (W) 12, it is guaranteed that all perfect matches will be found when the primers have the length 15 bp or more. Multiple Oligo Search is written in context of the Nesvold method and detection of unknown genetically modified organisms. When designing a genetically modified organism (GMO), a DNA sequence is introduced into the cell of for example a plant or any other organism. If the whole genome sequence of the organism that is examined is known, it is theoretically possible to construct a microarray which contains all sequences of a certain length that is not included in the organism. The plant Arabidopsis thaliana is examined in the context of the Nesvold method, and the chosen nucleotide sequence length is 15 bp. Since there are more than one billion different nucleotide sequences of length 15 bp, three reduction steps are used to come down to an amount of approximately six million which is few enough for designing a microarray. It is the probes from this microarray after hybridization of the unknown GMO that Multiple Oligo Search is designed for. With the prediction that two percent of the probes give false positive signals, will there be approximately 120,000 false positive signals out of six million probes. But, with the criterion that both the forward and the reverse strand must give a signal, there are approximately 1200 nucleotide sequences with both true and false positive signals. Multiple Oligo Search uses these 1200 nucleotide sequences as input, and returns a list of which database sequences that include most of the sequences. Tests of the application with simulated designed data showed that in approximately 50 percent of the times, the application returns the true insert. This means that to make the application more reliable, there must be less than two percent false positive signals.