Abstract

Mathematical probability has a rich theory and powerful applications. Of particular note is the Markov chain Monte Carlo (MCMC) method for sampling from high dimensional distributions that may not admit a naive analysis. We develop the theory of the MCMC method from first principles and prove its relevance. We also define a Bayesian hierarchical model for generating data. By understanding how data are generated we may infer hidden structure about these models. We use a specific MCMC method called a Gibbs' sampler to discover topic distributions in a hierarchical Bayesian model called Topics Over Time. We propose an innovative use of this model to discover disease and treatment topics in a corpus of health insurance claims data. By representing individuals as mixtures of topics, we are able to consider their future costs on an individual level rather than as part of a large collective.

Degree

MS

College and Department

Physical and Mathematical Sciences; Mathematics

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2013-10-18

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd6532

Keywords

Probability, Bayesian Data Analysis, Machine Learning, Markov Chains, Markov Chains, Markov Chain Monte Carlo, Bayesian Network, Latent Dirichlet Allocation, Topics Over Time

Included in

Mathematics Commons

Share

COinS