Objective Decoupling in Social Reinforcement Learning: Recovering Ground Truth from Sycophantic Majorities

Majid Ghasemi; Mark Crowley

arXiv:2602.08092·cs.AI·February 10, 2026

Objective Decoupling in Social Reinforcement Learning: Recovering Ground Truth from Sycophantic Majorities

Majid Ghasemi, Mark Crowley

PDF

Open Access

TL;DR

This paper reveals that social reinforcement learning can lead to objective decoupling where agents learn misaligned goals due to biased human feedback, and proposes a method to recover the true objectives by evaluating feedback sources.

Contribution

The paper introduces Objective Decoupling as a failure mode in social RL and proposes Epistemic Source Alignment (ESA) to reliably recover true objectives despite biased evaluators.

Findings

01

Standard RL fails under sycophantic evaluators.

02

ESA guarantees convergence to true objectives.

03

Empirical results show ESA outperforms consensus methods.

Abstract

Contemporary AI alignment strategies rely on a fragile premise: that human feedback, while noisy, remains a fundamentally truthful signal. In this paper, we identify this assumption as Dogma 4 of Reinforcement Learning (RL). We demonstrate that while this dogma holds in static environments, it fails in social settings where evaluators may be sycophantic, lazy, or adversarial. We prove that under Dogma 4, standard RL agents suffer from what we call Objective Decoupling, a structural failure mode where the agent's learned objective permanently separates from the latent ground truth, guaranteeing convergence to misalignment. To resolve this, we propose Epistemic Source Alignment (ESA). Unlike standard robust methods that rely on statistical consensus (trusting the majority), ESA utilizes sparse safety axioms to judge the source of the feedback rather than the signal itself. We prove that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Ethics and Social Impacts of AI · Adversarial Robustness in Machine Learning