Adapter Merging Reactivates Latent Reasoning Traces: A Mechanism Analysis
Junyi Zou

TL;DR
This paper investigates how adapter merging in large language models can reactivate latent reasoning traces, introduces new evaluation methods, and proposes interventions to control trace leakage and improve accuracy.
Contribution
It provides a detailed analysis of trace leakage phenomena, introduces a marker-forbidden evaluation method, and proposes a geometry-aware merge technique to mitigate leakage.
Findings
Adapter merging can reactivate explicit reasoning traces.
Interventions in logit space can modulate decision distributions.
Layer-wise analysis reveals partial misalignment in adapter updates.
Abstract
Large language models fine-tuned via a two-stage pipeline (domain adaptation followed by instruction alignment) can exhibit non-trivial interference after adapter merging, including the re-emergence of explicit reasoning traces under strict decoding. We study this phenomenon in medical LLM settings using lightweight, reproducible measurements of trace leakage and instruction-following behavior. Beyond marker-based proxies, we introduce a marker-forbidden, answer-only evaluation and define a correctness-based direction that does not rely on surface markers; a rank-1 logit-space intervention along this direction modulates decision distributions and improves multiple-choice accuracy beyond random-direction controls at sufficiently large intervention strength. We further provide layer-wise geometric evidence that domain and instruction adapters induce partially misaligned update directions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
