Understanding the composition, evolution, and function of the cotton (Gossypium) genome is complicated by the joint presence of two genomes in its nucleus (AT and DT genomes). Specifically, read-mapping (a fundamental part of next-generation sequence analysis) cannot adequately differentiate reads as belonging to one genome or the other. These two genomes were derived from progenitor A-genome and D-genome diploids involved in ancestral allopolyploidization. To better understand the allopolyploid genome, we developed PolyCat to categorize reads according to their genome of origin based on homoeo-SNPs that differentiate the two genomes. We re-sequenced the genomes of extant diploid relatives of tetraploid cotton that contain the A1 (Gossypium herbaceum), A2 (Gossypium arboreum), or D5 (Gossypium raimondii) genomes. We identified 24 million SNPs between the A-diploid and D-diploid genomes. These analyses facilitated the construction of a robust index of conserved SNPs between the A-genomes and D-genomes at all detected polymorphic loci. This index can be used by PolyCat to assign reads from an allotetraploid to its genome-of-origin. Continued characterization of the Gossypium genomes will further enhance our ability to manipulate fiber and agronomic production of cotton. We re-sequenced 34 allotetraploid cotton lines, representing all 7 tetraploid cotton species. The analysis of these genomes-using PolyCat and PolyDog-provides us with the beginnings of a HapMap-like resource for cotton species, including indices of both homoeo-SNPs and allele-SNPs. With this information, we explore the phylogenetic relationships among cotton species, including the newly characterized species G. ekmanianum and G. stephensii. We examine gene conversion both recent and ancient, discovering that recent gene conversion is extremely rare, and ancient gene conversion is far less extensive than previously believed, with many previously identified conversion events being more probably due to autapamorphic SNPs in the descent of diploid relatives. In order to carry out these experiments, many tools for next-generation sequence analysis were developed. These tools, along with PolyCat and PolyDog, comprise the tool suite BamBam.
College and Department
Life Sciences; Biology
BYU ScholarsArchive Citation
Page, Justin Thomas, "Bioinformatics for the Comparative Genomic Analysis of the Cotton (Gossypium) Polyploid Complex" (2015). All Theses and Dissertations. 5557.
next-generation sequencing, allopolyploidy, bioinformatics, comparative genomics, cotton, Gossypium, HapMap