State Transfer Reveals Reuse in Controlled Routing
Yanzhen Lu, Zhicheng Qian, Muchen Jiang, Xingyu Zhou

TL;DR
This paper investigates how fixed-interface reuse in controlled routing tasks indicates underlying state transfer, distinguishing it from mere prompt success, across different GPT-2 models and architectures.
Contribution
It introduces a methodology to differentiate fixed-interface reuse from prompt relocation, providing evidence of state transfer in controlled routing tasks.
Findings
Fixed-interface transfer demonstrates stronger evidence of state reuse than prompt success.
Zero-retrain compiled transfer at the fixed interface recovers most donor routing accuracy.
Generation and reasoning analyses reveal broader transport and weaker control identifiability with longer trajectories.
Abstract
Prompt-based interventions can change model behavior, but trained success alone does not identify where the behaviorally relevant state is represented. We study this question in controlled routing tasks using interfaces chosen on support data, held-out query evaluation, and matched necessity, sufficiency, and wrong-interface controls. On GPT-2 triop, an early interface supports exact transfer under these tests. On GPT-2 add/sub, zero-retrain compiled transfer at the fixed interface recovers most of donor routing accuracy, while trainable prompt slots can relearn the same behavior at several other positions only after additional support examples and optimization. These results distinguish fixed-interface reuse from prompt relocation in a setting where the two can be tested directly. Qwen routing provides a cross-architecture consistency check for the same matched-interface pattern at the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
