Abstract
Topic modeling, an unsupervised technique used to gain high-level understanding of a large collection of documents, often involves two major goals: The discovery of topics used in the corpus (topic-discovery) and the assignment of topics to individual words (token-level topic assignment). While Latent Dirichlet Allocation (LDA) normally performs both of these steps simultaneously, some situations require only the token-level topic assignments, using fixed topics. We evaluate three topic assignment strategies using fixed topics -- Gibbs sampling, iterated conditional modes, and mean field variational inference -- to determine which should be used when only token-level topic assignment is needed. Among these methods, we find iterated conditional modes performs best with respect to significance, consistency, and runtime, and variational inference performs best with down-stream classification accuracy.
Degree
MS
College and Department
Computational, Mathematical, and Physical Sciences; Computer Science
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Cowley, Stephen, "Inference Methods for Token-Level Topic Assignments with Fixed Topics" (2023). Theses and Dissertations. 10629.
https://scholarsarchive.byu.edu/etd/10629
Date Submitted
2023-12-23
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd13466
Keywords
Topic Modeling, token-level topics, topic assignment
Language
english