Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction

Xingwu Chen; Zhanqiu Zhang; Yiwen Guo; Difan Zou

arXiv:2603.04783·cs.AI·May 12, 2026

Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction

Xingwu Chen, Zhanqiu Zhang, Yiwen Guo, Difan Zou

PDF

1 Repo

TL;DR

This paper introduces RLSTA, a training method that uses single-turn anchors to improve multi-turn reasoning in language models, addressing their tendency to rigidly stick to previous reasoning.

Contribution

RLSTA leverages single-turn capabilities as anchors to enhance multi-turn interaction stability and generalization across domains without external verifiers.

Findings

01

RLSTA outperforms standard fine-tuning and abstention methods.

02

It demonstrates strong cross-domain generalization from math to code.

03

The approach is effective even without external verifiers.

Abstract

While LLMs demonstrate strong reasoning capabilities when provided with full information in a single turn, they exhibit substantial vulnerability in multi-turn interactions. Specifically, when information is revealed incrementally or requires updates, models frequently fail to integrate new constraints, leading to a collapse in performance compared to their single-turn baselines. We term the root cause as \emph{Contextual Inertia}: a phenomenon where models rigidly adhere to previous reasoning traces. Even when users explicitly provide corrections or new data in later turns, the model ignores them, preferring to maintain consistency with its previous (incorrect) reasoning path. To address this, we introduce \textbf{R}einforcement \textbf{L}earning with \textbf{S}ingle-\textbf{T}urn \textbf{A}nchors (\textbf{RLSTA}), a generalizable training approach designed to stabilize multi-turn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Tencent/RLSTA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.