Routing Sensitivity Without Controllability: A Diagnostic Study of Fairness in MoE Language Models
Junhyeok Lee, Kyu Sung Choi

TL;DR
This paper introduces FARE, a diagnostic framework to evaluate the limits of routing-level fairness interventions in MoE language models, revealing structural and utility trade-offs.
Contribution
It provides a systematic analysis showing that routing sensitivity alone cannot reliably control stereotypes in MoE models, highlighting architectural constraints.
Findings
Routing-level preference shifts are often unachievable or non-robust.
Bias and knowledge are entangled within expert groups.
Routing sensitivity does not translate into improved generation fairness.
Abstract
Mixture-of-Experts (MoE) language models are universally sensitive to demographic content at the routing level, yet exploiting this sensitivity for fairness control is structurally limited. We introduce Fairness-Aware Routing Equilibrium (FARE), a diagnostic framework designed to probe the limits of routing-level stereotype intervention across diverse MoE architectures. FARE reveals that routing-level preference shifts are either unachievable (Mixtral, Qwen1.5, Qwen3), statistically non-robust (DeepSeekMoE), or accompanied by substantial utility cost (OLMoE, -4.4%p CrowS-Pairs at -6.3%p TQA). Critically, even where log-likelihood preference shifts are robust, they do not transfer to decoded generation: expanded evaluations on both non-null models yield null results across all generation metrics. Group-level expert masking reveals why: bias and core knowledge are deeply entangled within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
