Abstract

Topic modeling, an unsupervised technique used to gain high-level understanding of a large collection of documents, often involves two major goals: The discovery of topics used in the corpus (topic-discovery) and the assignment of topics to individual words (token-level topic assignment). While Latent Dirichlet Allocation (LDA) normally performs both of these steps simultaneously, some situations require only the token-level topic assignments, using fixed topics. We evaluate three topic assignment strategies using fixed topics -- Gibbs sampling, iterated conditional modes, and mean field variational inference -- to determine which should be used when only token-level topic assignment is needed. Among these methods, we find iterated conditional modes performs best with respect to significance, consistency, and runtime, and variational inference performs best with down-stream classification accuracy.

Degree

College and Department

Computational, Mathematical, and Physical Sciences; Computer Science

Rights

https://lib.byu.edu/about/copyright/