One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer

Jucheng Shen; Barbara Su; Anastasios Kyrillidis

arXiv:2605.17811·cs.LG·May 19, 2026

One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer

Jucheng Shen, Barbara Su, Anastasios Kyrillidis

PDF

1 Repo

TL;DR

This paper demonstrates that a shared-weight recurrent Transformer can develop distinct internal roles through input asymmetry and state dynamics, leading to emergent specialization without explicit partitioning.

Contribution

It introduces the AIR architecture showing how a single shared Transformer can spontaneously develop specialized internal states through input injection differences.

Findings

01

Shared model develops distinct proposal and uncertainty states.

02

Input asymmetry and state dynamics induce specialization.

03

Attention analysis reveals different localities for update types.

Abstract

Can a shared-weight recurrent Transformer develop distinct internal roles without being partitioned into separate modules? We study this in Asymmetric Input Recurrence (AIR), a minimal two-state reasoning architecture in which the same Transformer model is reused for both updates (per literature, L and H) and the only built-in difference in the update rule is that the encoded input is injected during L-updates but not H-updates. Across Sudoku-Extreme and Maze, decoded rollouts reveal a consistent split: $\zH$ behaves like a fully committed proposal state, whereas $\zL$ retains local uncertainty and shifting intermediate structure. Freeze experiments show that this split is, in practice, related to the model's state dynamics: in Sudoku, freezing $\zH$ reduces $\zL$ 's content changes whereas freezing $\zL$ increases $\zH$ 's, while in Maze, freezing either state increases content changes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

juchengshen/air
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.