Abstract

Mathematical probability has a rich theory and powerful applications. Of particular note is the Markov chain Monte Carlo (MCMC) method for sampling from high dimensional distributions that may not admit a naive analysis. We develop the theory of the MCMC method from first principles and prove its relevance. We also define a Bayesian hierarchical model for generating data. By understanding how data are generated we may infer hidden structure about these models. We use a specific MCMC method called a Gibbs' sampler to discover topic distributions in a hierarchical Bayesian model called Topics Over Time. We propose an innovative use of this model to discover disease and treatment topics in a corpus of health insurance claims data. By representing individuals as mixtures of topics, we are able to consider their future costs on an individual level rather than as part of a large collective.

Degree

MS

College and Department

Physical and Mathematical Sciences; Mathematics

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2013-10-18

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd6532

Keywords

Probability, Bayesian Data Analysis, Machine Learning, Markov Chains, Markov Chains, Markov Chain Monte Carlo, Bayesian Network, Latent Dirichlet Allocation, Topics Over Time

Language

English

Included in

Mathematics Commons

Share

COinS