When to Think Fast and Slow? AMOR: Adaptive Entropy Gate for Hybrid Models
Haoran Zheng, Chen Shani

TL;DR
AMOR is an adaptive hybrid model that selectively applies attention based on predictive uncertainty, improving efficiency and robustness in language tasks.
Contribution
It introduces a simple, gradient-free, uncertainty-driven routing mechanism for hybrid models, outperforming fixed-schedule approaches across multiple benchmarks.
Findings
AMOR invokes attention on only ~22% of tokens.
It matches or outperforms pure recurrent and fixed-schedule hybrids.
It maintains stable performance under distribution shift.
Abstract
Recurrent-attention hybrids aim to combine the efficiency of recurrence with the expressivity of attention, but existing approaches typically apply attention uniformly across all positions, even when the recurrent state alone is sufficient for accurate prediction. We introduce AMOR (Adaptive Metacognitive Output Router), a post-hoc hybrid architecture that selectively invokes attention based on predictive uncertainty. A recurrent backbone is augmented with entropy-gated attention blocks that activate only when the model's output entropy exceeds a dynamic threshold derived from a running batch median and scaled standard deviation. This yields a simple, gradient-free routing mechanism inspired by uncertainty-driven computation and the System 1 / System 2 distinction. Across Mamba2 and Gated DeltaNet backbones (180M-1.5B), AMOR consistently matches or outperforms both pure recurrent models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
