From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement

Varad Vishwarupe; Nigel Shadbolt; Marina Jirotka

arXiv:2605.14912·cs.AI·May 15, 2026

From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement

Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka

PDF

TL;DR

This paper argues that AI systems should surface disagreement through mechanisms like scoping, signalling, and repair, moving beyond simple preference aggregation to better handle genuine value pluralism and avoid sycophantic consensus.

Contribution

It introduces the Pluralistic Repair Score (PRS) metric to measure principled disagreement and repair, emphasizing the importance of disagreement in AI alignment and deployment governance.

Findings

01

Empirical illustration on two RLHF-trained models shows coexistence of agreement and low repair quality.

02

PRS distinguishes principled revision from capitulation, measuring interactional disagreement.

03

Disagreement and repair are critical for effective pluralistic alignment in deployed AI systems.

Abstract

Pluralistic alignment is typically operationalised as preference aggregation: producing responses that span (Overton), steer toward (Steerable), or proportionally represent (Distributional) diverse human values. We argue that aggregation alone is an incomplete primitive for deployed pluralistic alignment. Under genuine value pluralism, the failure mode of contemporary RLHF-trained assistants is not insufficient coverage but sycophantic consensus: a learned tendency to agree with, validate, and minimise friction with the immediate interlocutor. Because deployed AI systems now mediate consequential deliberation across health, civic life, labour, and governance, the collapse of disagreement at the interaction layer is not a narrow technical concern but a structural failure with distributive consequences. We reframe pluralistic alignment around three conversational mechanisms drawn from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.