Route-Induced Density and Stability (RIDE): Controlled Intervention and Mechanism Analysis of Routing-Style Meta Prompts on LLM Internal States
Dianxing Zhang, Gang Li, Sheng Li

TL;DR
This paper investigates how routing-style meta prompts influence internal representations and stability in instruction-tuned large language models, challenging common beliefs about sparsity and certainty.
Contribution
It introduces RIDE, a diagnostic probe that analyzes the effects of routing prompts on internal density, attention, and output stability across multiple models.
Findings
Meta prompts densify internal representations rather than sparsify them.
Natural-language instructions often outperform structured tags in activating models.
The link between densification and output stability is weak and varies across models.
Abstract
Routing is widely used to scale large language models, from Mixture-of-Experts gating to multi-model/tool selection. A common belief is that routing to a task ``expert'' activates sparser internal computation and thus yields more certain and stable outputs (the Sparsity--Certainty Hypothesis). We test this belief by injecting routing-style meta prompts as a textual proxy for routing signals in front of frozen instruction-tuned LLMs. We quantify (C1) internal density via activation sparsity, (C2) domain-keyword attention, and (C3) output stability via predictive entropy and semantic variation. On a RouterEval subset with three instruction-tuned models (Qwen3-8B, Llama-3.1-8B-Instruct, and Mistral-7B-Instruct-v0.2), meta prompts consistently densify early/middle-layer representations rather than increasing sparsity; natural-language expert instructions are often stronger than structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
