Abstract

Large language models (LLMs) are increasingly used to simulate responses of human demographic groups, a task known as silicon sampling. This paper investigates whether supervised fine-tuning can improve algorithmic fidelity in such settings. We construct a dataset from the General Social Survey (GSS), converting demographic attributes into natural-language backstories and randomly selecting one attribute as a held-out target. Fine-tuning a LLaMA-3.1-70B model on this dataset yields significant improvements on two tasks: (1) imputing missing attributes at the individual level, and (2) reproducing response distributions for demographic slices, such as racial and age groups. Notably, these improvements are achieved with minimal performance degradation on general language benchmarks. While related work has fine-tuned LLMs for survey simulation, we refine and extend these approaches by combining randomized dependent-variable masking, naturalistic backstories, and a probability-matching objective. We evaluate both individual-level and distributional fidelity under an out-of-year robustness check (train 2021, eval 2022).

Degree

MS

College and Department

Computer Science; Computational, Mathematical, and Physical Sciences

Rights

https://lib.byu.edu/about/copyright/

Date Submitted

2025-12-17

Document Type

Thesis

Keywords

Silicon Sampling, Large Language Models, Algorithmic Fidelity, Computational Social Science, Supervised Fine-Tuning, General Social Survey, Survey Simulation, Demographic Representation

Language

english

Share

COinS