AI Alignment Breaks at the Edge
Han Bao, Yue Huang, Xiaoda Wang, Zheyuan Zhang, Yujun Zhou, Carl Yang, Xiangliang Zhang, Yanfang Ye

TL;DR
This paper argues that current AI alignment practices overlook critical edge cases and proposes an Edge alignment framework to better detect, evaluate, and govern model failures in complex, value-diverse scenarios.
Contribution
It introduces the concept of Edge alignment to address evaluation blind spots and outlines a diagnostic set and governance approach for handling edge cases in AI safety.
Findings
Ordinary helpfulness and safety metrics miss process failures.
Edge-aware evaluation exposes failures not visible in average-case metrics.
A pilot set of 91 edge cases reveals gaps in current model safety assessments.
Abstract
General Alignment has improved average-case helpfulness and safety, but current alignment practice still rewards confident, single-turn responses. The problem is not only that models fail on edge cases; it is that current evaluation makes many of these failures hard to see. We take the position that alignment must move beyond average-case evaluation by making failures under value conflict, plural stakeholder disagreement, and epistemic ambiguity visible and actionable. Scalar rewards compress diverse values into a single number; data and evaluation regimes collapse, filter, or fail to elicit the cases where alignment is hardest; and governance often lacks mechanisms for adjudicating contested cases. These blind spots produce value flattening, representation loss, and uncertainty blindness. We use Edge alignment to name a detection, evaluation, and governance agenda for surfacing these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Scientific Computing and Data Management · Computational and Text Analysis Methods
