MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo; Lichao Wu; Marina Kr\v{c}ek; Sengim Karayal\c{c}in; Stjepan Picek

arXiv:2604.27818·cs.CR·May 1, 2026

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo, Lichao Wu, Marina Kr\v{c}ek, Sengim Karayal\c{c}in, Stjepan Picek

PDF

1 Repo

TL;DR

MASCing is a novel framework that enables flexible, scenario-specific safety reconfiguration of Mixture-of-Experts models without retraining, using activation steering masks to control expert behavior.

Contribution

It introduces MASCing, the first method to reconfigure MoE model behavior across safety scenarios via steering masks, capturing routing dependencies with an LSTM surrogate model.

Findings

01

Improves jailbreak defense success rate from 52.5% to 83.9%.

02

Increases adult-content generation success rate from 52.6% to 82.0%.

03

Demonstrates negligible overhead across seven open-source MoE models.

Abstract

Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) have significantly reduced inference costs through sparse activation. However, this sparse activation paradigm also introduces new safety challenges. Since only a subset of experts is engaged for each input, model behavior becomes coupled to routing decisions, yielding a difficult-to-control mechanism that can vary across safety-relevant scenarios. At the same time, adapting model behavior through full fine-tuning or retraining is costly, especially when developers need to rapidly configure the same model for different safety objectives. We present MASCing (MoE Activation Steering Configuration), the first framework that enables flexible reconfiguration of MoE behavior across diverse safety scenarios without retraining. MASCing uses an LSTM-based surrogate model to capture cross-layer routing dependencies and map…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jonatelintelo/MASCing
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.