Cotton is a crop with a large global economic impact as well as a large, complex genome. Most industrial cotton production is from two tetraploid species (Gossypium hirsutum L. and Gossypium barbadense L.) which contain two subgenomes, specifically the AT and DT subgenomes. The DT subgenome is nearly half the size of the AT subgenome in tetraploid cotton and is closely related to an extant D-genome Gossypium species, G. raimondii Ulbr. Characterization of the structural variants present in diploid D-genome should provide greater insight into the evolution of the DT subgenome in the tetraploid cotton. Bionano (BNG) optical mapping uses patterns of fluorescent labels inserted at specific endonuclease sites to create physical maps of the genomes which can then be examined for structural variation. To develop optical maps in G. raimondii, we first developed a de novo PacBio long read sequence assembly of G. raimondii. This sequence assembly consisted of 2,379 contigs, an average contig length of 413 Kb and a contig N50 of 4.9 Mb. Using BNG technology, we developed two optical maps of the diploid D genome of G. raimondii. One was created using the Nt.BssSI endonuclease and one with the Nt.BspQI endonuclease. Using the BNG optical maps, the PacBio assembly was hybrid scaffolded into 100 scaffolds (+ 5 unscaffolded contigs) with an average scaffold length of 7.5 Mb and a scaffold N50 of 13.1 Mb. A comparison between the Nt. BssSI BNG optical map and the two sequence assemblies identified 3,195 structural variants. These were used to validate the accuracy of the reference sequence of G. raimondii and structural variants were used to create a new phylogeny of nine major cotton species.



College and Department

Life Sciences; Plant and Wildlife Sciences



Date Submitted


Document Type





cotton, Gossypium, Bionano Genomics, physical mapping, structural variants