Abstract
Large language models (LLMs) show remarkable abilities in both factual and creative tasks, yet the problem of hallucination persists despite advances in training and post-training methods. Recent progress in model interpretability suggests that model behavior can be predicted and influenced by analyzing and manipulating internal activations. In this thesis, we address three questions: (1) Can we identify linear representations of hallucination and creativity in the model’s latent space? (2) Do these representations play a causal role in model behavior when manipulated? (3) Are creativity and hallucination causally intertwined? To answer these, we construct a novel dataset of factual, hallucinated, and creative responses to Python package queries. We analyze the data using PCA and train logistic regression probes to test whether creative and hallucinatory activations are linearly separable. We further use mass-mean probes to extract semantic directions and apply additive and ablative interventions on activations to test causal effects. Our findings show that representations for creativity and hallucination are weakly correlated and largely independent. We also find that post-hoc analysis of generated tokens is more effective than predictive analysis for identifying hallucination representations. Finally, we show that synthetically constructed hallucination activations can serve as suitable proxies for genuine hallucinations when training classifier probes. These results advance our understanding of how LLMs encode and generate hallucinations, and they suggest new directions for interpretability methods that aim to detect and steer model behavior.
Degree
MS
College and Department
Computer Science; Computational, Mathematical, and Physical Sciences
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Bowcut, McKay, "An Analysis of Large Language Model Hallucination and Creativity: Linear Representations and Causal Roles" (2025). Theses and Dissertations. 11056.
https://scholarsarchive.byu.edu/etd/11056
Date Submitted
2025-11-07
Document Type
Thesis
Keywords
computer, science, deep, learning, large, language, model, hallucination
Language
english