Gossypium herbaceum is a species of cotton native to Africa and Asia. As part of a larger effort to investigate structural variation in assorted diploid and polyploid cotton genomes we have sequenced and assembled the genome of G. herbaceum. Cultivated G. herbaceum is an A1-genome diploid from the Old World (Africa) with a genome size of approximately 1.7 Gb. Long range information is essential in constructing a high-quality assembly, especially when the genome is expected to be highly repetitive. Here we present a quality draft genome of G. herbaceum (cv. Wagad) using a multi-platform sequencing strategy (PacBio RS II, Dovetail Genomics, Phase Genomics, BioNano Genomics). PacBio RS II (60X) long reads were de novo assembled using the CANU assembler. Illumina sequence reads generated from the PROXIMO library method from Phase Genomics, and BioNano high-fidelity whole genome maps were used to further scaffolding. Finally, the assembly was polished using PILON. This multi-platform long range sequencing strategy will help greatly in attaining high quality de novo reconstructions of genomes. This assembly will be used towards comparative analysis with G. arboreum, which is also a domesticated A2-genome diploid. Not only will this provide a quality reference genome for G. herbaceum, it also provides an opportunity to assess recent technologies such as Dovetail Genomics, Phase Genomics, and Bionano Genomics. The G. herbaceum genome sequence serves as an example to the plant genomics community for those who have an interest in using multi-platform sequencing technologies for de novo genome sequencing.



College and Department

Life Sciences; Plant and Wildlife Sciences

Date Submitted


Document Type





Gossypium, G. herbaceum, cotton, Pacific Biosciences, draft sequence assembly, proximity guided assembly