Understanding the possible mechanisms of gene transcription regulation is a primary challenge for current molecular biologists. Identifying transcription factor binding sites (TFBSs), also called DNA motifs, is an important step in understanding these mechanisms. Furthermore, many human diseases are attributed to mutations in TFBSs, which makes identifying those DNA motifs significant for disease treatment. Uncertainty and variations in specific nucleotides of TFBSs present difficulties for DNA motif searching. In this project, we present an algorithm, XPRIME-EM (Eliciting EXpert PRior Information for Motif Exploration using the Expectation-Maximization Algorithm), which can discover known and de novo (unknown) DNA motifs simultaneously from a collection of DNA sequences using a modified EM algorithm and describe the variation nature of DNA motifs using position specific weight matrix (PWM). XPRIME improves the efficiency of locating and describing motifs by prevent the overlap of multiple motifs, a phenomenon termed a phase shift, and generates stronger motifs by considering the correlations between nucleotides at different positions within each motif. Moreover, a Bayesian formulation of the XPRIME algorithm allows for the elicitation of prior information for motifs of interest from literature and experiments into motif searching. We are the first research team to incorporate human genome-wide nucleosome occupancy information into the PWM based DNA motif searching.



College and Department

Physical and Mathematical Sciences; Statistics



Date Submitted


Document Type

Selected Project




DNA motif, modified EM algorithm, human nucleosome occupancy information