Community-Aligned Behavior Under Uncertainty: Evidence of Epistemic Stance Transfer in LLMs
Patrick Gerard, Aiden Chang, Svitlana Volkova

TL;DR
This paper investigates whether aligned large language models (LLMs) exhibit community-specific behaviors under uncertainty, demonstrating that such behaviors persist even after removing explicit knowledge, indicating structured, generalizable alignment effects.
Contribution
The authors introduce a framework for testing epistemic stance transfer in LLMs, revealing that community-aligned behaviors are encoded beyond mere pattern recall.
Findings
Aligned LLMs maintain community-specific responses after fact removal.
Behavioral patterns are structured and not solely based on training data recall.
Framework enables detection of persistent behavioral biases under ignorance.
Abstract
When large language models (LLMs) are aligned to a specific online community, do they exhibit generalizable behavioral patterns that mirror that community's attitudes and responses to new uncertainty, or are they simply recalling patterns from training data? We introduce a framework to test epistemic stance transfer: targeted deletion of event knowledge, validated with multiple probes, followed by evaluation of whether models still reproduce the community's organic response patterns under ignorance. Using Russian--Ukrainian military discourse and U.S. partisan Twitter data, we find that even after aggressive fact removal, aligned LLMs maintain stable, community-specific behavioral patterns for handling uncertainty. These results provide evidence that alignment encodes structured, generalizable behaviors beyond surface mimicry. Our framework offers a systematic way to detect behavioral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Topic Modeling · Opinion Dynamics and Social Influence
