Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety

Can Jin; Rui Wu; Tong Che; Qixin Zhang; Hongwu Peng; Jiahui Zhao; Zhenting Wang; Wenqi Wei; Ligong Han; Zhao Zhang; Yuan Cao; Ruixiang Tang; Dimitris N. Metaxas

arXiv:2601.08000·cs.AI·January 14, 2026

Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety

Can Jin, Rui Wu, Tong Che, Qixin Zhang, Hongwu Peng, Jiahui Zhao, Zhenting Wang, Wenqi Wei, Ligong Han, Zhao Zhang, Yuan Cao, Ruixiang Tang, Dimitris N. Metaxas

PDF

Open Access

TL;DR

This paper introduces CADA, a case-augmented deliberative alignment method for LLMs that improves safety and robustness by training models on safety reasoning cases, avoiding rigid rule adherence.

Contribution

It demonstrates that case-augmented reasoning enhances safety and helpfulness in open-source LLMs, providing a practical alternative to rule-based safety approaches.

Findings

01

CADA improves harmlessness and robustness against attacks.

02

Training on safety reasoning cases enhances generalization.

03

CADA reduces over-refusal while maintaining utility.

Abstract

Ensuring that Large Language Models (LLMs) adhere to safety principles without refusing benign requests remains a significant challenge. While OpenAI introduces deliberative alignment (DA) to enhance the safety of its o-series models through reasoning over detailed ``code-like'' safety rules, the effectiveness of this approach in open-source LLMs, which typically lack advanced reasoning capabilities, is understudied. In this work, we systematically evaluate the impact of explicitly specifying extensive safety codes versus demonstrating them through illustrative cases. We find that referencing explicit codes inconsistently improves harmlessness and systematically degrades helpfulness, whereas training on case-augmented simple codes yields more robust and generalized safety behaviors. By guiding LLMs with case-augmented reasoning instead of extensive code-like safety rules, we avoid rigid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Ethics and Social Impacts of AI