Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences
Quan Cheng

TL;DR
This paper argues that negative constraints are structurally superior to positive preferences for AI alignment because they encode clear, verifiable prohibitions, leading to more stable and effective alignment strategies.
Contribution
It provides a unified theoretical account explaining why negative signals outperform positive preferences in AI alignment, based on their structural asymmetry and epistemological foundations.
Findings
Negative-only feedback methods match or exceed RLHF performance.
Negative constraints lead to stable boundaries for model behavior.
Preference-based methods are prone to sycophancy and surface correlations.
Abstract
Recent empirical results have demonstrated that training large language models (LLMs) with negative-only feedback can match or exceed standard reinforcement learning from human feedback (RLHF). Negative Sample Reinforcement achieves parity with PPO on mathematical reasoning; Distributional Dispreference Optimization trains effectively using only dispreferred samples; and Constitutional AI outperforms pure RLHF on harmlessness benchmarks. Yet no unified theoretical account explains why negative signals are so effective. This paper proposes such an account: positive preferences and negative constraints are structurally asymmetric. Positive preferences ("which is better") encode continuously coupled, context-dependent human values that cannot be exhaustively specified -- leading models to learn surface correlates such as agreement with the user (sycophancy). Negative constraints ("what is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Explainable Artificial Intelligence (XAI) · Embodied and Extended Cognition
