One of the primary goals of active research in molecular biology is to better understand the process of transcription regulation. An important objective in understanding transcription is identifying transcription factors that directly regulate target genes. Identifying these transcription factors is a key step toward eliminating genetic diseases or disease susceptibilities that are encoded inside deoxyribonucleic acid (DNA). There is much uncertainty and variation associated with transcription factor binding sites, requiring these sites to be represented stochastically. Although typically each transcription factor prefers to bind to a specific DNA word, it can bind to different variations of that DNA word. In order to model these uncertainties, we use a Bayesian approach that allows the binding probabilities associated with the motif to vary. This project presents a new method for motif searching that uses expert prior information to scan DNA sequences for multiple known motif binding sites as well as new motifs. The method uses a mixture model to model the motifs of interest where each motif is represented by a Multinomial distribution, and Dirichlet prior distributions are placed on each motif of interest. Expert prior information is given to search for known motifs and diffuse priors are used to search for new motifs. The posterior distribution of each motif is then sampled using Markov Chain Monte Carlo (MCMC) techniques and Gibbs sampling.



College and Department

Physical and Mathematical Sciences; Statistics



Date Submitted


Document Type

Selected Project




motif, transcription factor, Gibbs sampling, DNA, XPRIME, ETS, RUNX