MOSAIC: Composable Safety Alignment with Modular Control Tokens

Jingyu Peng; Hongyu Chen; Jiancheng Dong; Maolin Wang; Wenxi Li; Yuchen Li; Kai Zhang; Xiangyu Zhao

arXiv:2603.16210·cs.AI·March 18, 2026

MOSAIC: Composable Safety Alignment with Modular Control Tokens

Jingyu Peng, Hongyu Chen, Jiancheng Dong, Maolin Wang, Wenxi Li, Yuchen Li, Kai Zhang, Xiangyu Zhao

PDF

Open Access

TL;DR

MOSAIC introduces a modular safety alignment framework for large language models, enabling context-dependent safety controls through learnable tokens that are composable at inference time, improving safety and utility.

Contribution

It proposes a novel modular control token approach for safety alignment, allowing flexible, context-dependent safety enforcement without retraining the entire model.

Findings

01

Achieves strong safety defense performance.

02

Reduces over-refusal compared to existing methods.

03

Preserves model utility effectively.

Abstract

Safety alignment in large language models (LLMs) is commonly implemented as a single static policy embedded in model parameters. However, real-world deployments often require context-dependent safety rules that vary across users, regions, and applications. Existing approaches struggle to provide such conditional control: parameter-level alignment entangles safety behaviors with general capabilities, while prompt-based methods rely on natural language instructions that provide weak enforcement. We propose MOSAIC, a modular framework that enables compositional safety alignment through learnable control tokens optimized over a frozen backbone model. Each token represents a safety constraint and can be flexibly activated and composed at inference time. To train compositional tokens efficiently, we introduce order-based task sampling and a distribution-level alignment objective that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)