Post-training makes large language models less human-like
Marcel Binz, Elif Akata, Abdullah Almaatouq, Mohammed Alsobay, Oleksii Ariasov, Franziska Br\"andle, David Broska, Jason W. Burton, Nuno Busch, Frederick Callaway, Vanessa Cheung, Brian Christian, Julian Coda-Forno, Can Demircan, Vittoria Dentella, Maria K. Eckstein

TL;DR
Post-training processes in large language models decrease their alignment with human behavior, and persona-induction techniques do not improve individual-level predictions, indicating a trade-off between utility and human-likeness.
Contribution
This study introduces Psych-201, a new dataset for measuring behavioral alignment, and reveals that post-training reduces human-likeness across models, with persona-induction being ineffective at the individual level.
Findings
Post-training reduces alignment with human behavior across models.
Newer models show increased misalignment despite improvements in base models.
Persona-induction does not enhance individual-level human-like predictions.
Abstract
Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in newer model generations even as base models continue to improve. Finally, we find that persona-induction -- a popular technique for eliciting human-like behavior by conditioning models on participant-specific information -- does not improve predictions at the level of individuals. Taken together, our results suggest that the very processes that are currently employed to turn LLMs into useful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
