EASE: Practical and Efficient Safety Alignment for Small Language Models

Haonan Shi; Guoli Wang; Tu Ouyang; An Wang

arXiv:2511.06512·cs.CR·November 11, 2025

EASE: Practical and Efficient Safety Alignment for Small Language Models

Haonan Shi, Guoli Wang, Tu Ouyang, An Wang

PDF

Open Access 2 Models 1 Video

TL;DR

EASE is a framework that enhances safety in small language models by selectively applying safety reasoning, significantly reducing jailbreak success and computational overhead for edge deployment.

Contribution

The paper introduces EASE, a novel method for efficient safety alignment in small language models through selective safety reasoning and effective knowledge distillation.

Findings

01

Reduces jailbreak success rates by up to 17%.

02

Decreases inference overhead by up to 90%.

03

Maintains safety and efficiency balance in resource-constrained environments.

Abstract

Small language models (SLMs) are increasingly deployed on edge devices, making their safety alignment crucial yet challenging. Current shallow alignment methods that rely on direct refusal of malicious queries fail to provide robust protection, particularly against adversarial jailbreaks. While deliberative safety reasoning alignment offers deeper alignment for defending against sophisticated attacks, effectively implanting such reasoning capability in SLMs with limited capabilities remains an open challenge. Moreover, safety reasoning incurs significant computational overhead as models apply reasoning to nearly all queries, making it impractical for resource-constrained edge deployment scenarios that demand rapid responses. We propose EASE, a novel framework that enables practical and Efficient safety Alignment for Small languagE models. Our approach first identifies the optimal safety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

EASE: Practical and Efficient Safety Alignment for Small Language Models· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Graph Neural Networks · Big Data and Digital Economy