Abstract
Emerging sequencing approaches have revolutionized the way we can collect DNA sequence data for applications in bioforensics and biosurveillance. In this research, we present an approach to construct a database of known biological agents and use this database to develop a statistical framework to analyze raw reads from next-generation sequence data for species identification and strain attribution. Our method capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample or that the target strain is not even contained within the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for genome assembly - a time consuming and labor intensive step. We demonstrate our approach using genomic data from a variety of known bacterial agents of bioterrorism and agents impacting human health.
Degree
MS
College and Department
Physical and Mathematical Sciences; Statistics
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Francis, Owen Eric, "Species Identification and Strain Attribution with Unassembled Sequencing Data" (2012). Theses and Dissertations. 3200.
https://scholarsarchive.byu.edu/etd/3200
Date Submitted
2012-04-18
Document Type
Selected Project
Handle
http://hdl.lib.byu.edu/1877/etd5201
Keywords
Next-generation sequencing, bioforensics, biosurveillance, Bayesian mixture model
Language
English