Abstract

Large language models (LLMs) are increasingly used to simulate responses of human demographic groups, a task known as silicon sampling. This paper investigates whether supervised fine-tuning can improve algorithmic fidelity in such settings. We construct a dataset from the General Social Survey (GSS), converting demographic attributes into natural-language backstories and randomly selecting one attribute as a held-out target. Fine-tuning a LLaMA-3.1-70B model on this dataset yields significant improvements on two tasks: (1) imputing missing attributes at the individual level, and (2) reproducing response distributions for demographic slices, such as racial and age groups. Notably, these improvements are achieved with minimal performance degradation on general language benchmarks. While related work has fine-tuned LLMs for survey simulation, we refine and extend these approaches by combining randomized dependent-variable masking, naturalistic backstories, and a probability-matching objective. We evaluate both individual-level and distributional fidelity under an out-of-year robustness check (train 2021, eval 2022).

Degree

College and Department

Computer Science; Computational, Mathematical, and Physical Sciences

Rights

https://lib.byu.edu/about/copyright/