Abstract

Emerging sequencing approaches have revolutionized the way we can collect DNA sequence data for applications in bioforensics and biosurveillance. In this research, we present an approach to construct a database of known biological agents and use this database to develop a statistical framework to analyze raw reads from next-generation sequence data for species identification and strain attribution. Our method capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample or that the target strain is not even contained within the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for genome assembly - a time consuming and labor intensive step. We demonstrate our approach using genomic data from a variety of known bacterial agents of bioterrorism and agents impacting human health.

Degree

MS

College and Department

Physical and Mathematical Sciences; Statistics

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2012-04-18

Document Type

Selected Project

Handle

http://hdl.lib.byu.edu/1877/etd5201

Keywords

Next-generation sequencing, bioforensics, biosurveillance, Bayesian mixture model

Share

COinS