TL;DR
This study reveals that large language models tend to artificially homogenize philosophical opinions, overestimating consensus and losing individual heterogeneity, which impacts their use in alignment and evaluation tasks.
Contribution
The paper demonstrates that language models collapse heterogeneity in philosophical judgments, revealing limitations in their ability to replicate individual differences.
Findings
Language models over-correlate philosophical judgments, creating artificial consensus.
Models implicitly assume domain specialists share similar views.
Fine-tuning impacts the degree of heterogeneity collapse.
Abstract
Silicon samples are increasingly used as a low-cost substitute for human panels and have been shown to reproduce aggregate human opinion with high fidelity. We show that, in the alignment-relevant domain of philosophy, silicon samples systematically collapse heterogeneity. Using data from professional philosophers drawn from PhilPeople profiles, we evaluate seven proprietary and open-source large language models on their ability to replicate individual philosophical positions and to preserve cross-question correlation structures across philosophical domains. We find that language models substantially over-correlate philosophical judgments, producing artificial consensus across domains. This collapse is associated in part with specialist effects, whereby models implicitly assume that domain specialists hold highly similar philosophical views. We assess the robustness of these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
