Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control

Yonghui Yang; Wenjian Tao; Jilong Liu; Xingyu Zhu; Junfeng Fang; Weibiao Huang; Le Wu; Richang Hong; Tat-Sent Chua

arXiv:2602.07340·cs.LG·May 22, 2026

Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control

Yonghui Yang, Wenjian Tao, Jilong Liu, Xingyu Zhu, Junfeng Fang, Weibiao Huang, Le Wu, Richang Hong, Tat-Sent Chua

PDF

1 Repo

TL;DR

This paper introduces ShaPO, a geometry-aware optimization framework that enhances the robustness of large language model safety alignment by controlling optimization geometry, especially under distribution shifts and noisy data.

Contribution

ShaPO is a novel framework that enforces worst-case alignment objectives through selective geometry control, improving robustness over existing methods.

Findings

01

ShaPO consistently outperforms popular preference optimization methods across safety benchmarks.

02

ShaPO stabilizes likelihood-based optimization and enforces reward consistency under noisy supervision.

03

Combining ShaPO with data-robust objectives yields further robustness improvements.

Abstract

Safety alignment of large language models remains brittle under domain shift and noisy preference supervision. Most existing robust alignment methods focus on uncertainty in alignment data, while overlooking optimization-induced fragility in preference-based objectives. In this work, we revisit robustness for LLM safety alignment from an optimization geometry perspective, and argue that robustness failures cannot be addressed by data-centric methods alone. We propose \textit{ShaPO}, a geometry-aware preference optimization framework that enforces worst-case alignment objectives via selective geometry control over alignment-critical parameter subspace. By avoiding uniform geometry constraints, ShaPO mitigates the over-regularization that can harm robustness under distribution shift. We instantiate ShaPO at two levels: token-level ShaPO stabilizes likelihood-based surrogate optimization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liujilong0116/ShaPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Adversarial Robustness in Machine Learning · Bayesian Modeling and Causal Inference