AI Alignment Breaks at the Edge

Han Bao; Yue Huang; Xiaoda Wang; Zheyuan Zhang; Yujun Zhou; Carl Yang; Xiangliang Zhang; Yanfang Ye

arXiv:2602.20042·cs.CL·May 19, 2026

AI Alignment Breaks at the Edge

Han Bao, Yue Huang, Xiaoda Wang, Zheyuan Zhang, Yujun Zhou, Carl Yang, Xiangliang Zhang, Yanfang Ye

PDF

TL;DR

This paper argues that current AI alignment practices overlook critical edge cases and proposes an Edge alignment framework to better detect, evaluate, and govern model failures in complex, value-diverse scenarios.

Contribution

It introduces the concept of Edge alignment to address evaluation blind spots and outlines a diagnostic set and governance approach for handling edge cases in AI safety.

Findings

01

Ordinary helpfulness and safety metrics miss process failures.

02

Edge-aware evaluation exposes failures not visible in average-case metrics.

03

A pilot set of 91 edge cases reveals gaps in current model safety assessments.

Abstract

General Alignment has improved average-case helpfulness and safety, but current alignment practice still rewards confident, single-turn responses. The problem is not only that models fail on edge cases; it is that current evaluation makes many of these failures hard to see. We take the position that alignment must move beyond average-case evaluation by making failures under value conflict, plural stakeholder disagreement, and epistemic ambiguity visible and actionable. Scalar rewards compress diverse values into a single number; data and evaluation regimes collapse, filter, or fail to elicit the cases where alignment is hardest; and governance often lacks mechanisms for adjudicating contested cases. These blind spots produce value flattening, representation loss, and uncertainty blindness. We use Edge alignment to name a detection, evaluation, and governance agenda for surfacing these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Scientific Computing and Data Management · Computational and Text Analysis Methods