TAMTRL: Teacher-Aligned Reward Reshaping for Multi-Turn Reinforcement Learning in Long-Context Compression

Li Wang; Yandong Wang; Xin Yu; Kui Zhang; Tianhao Peng; Wenjun Wu

arXiv:2603.21663·cs.CL·March 24, 2026

TAMTRL: Teacher-Aligned Reward Reshaping for Multi-Turn Reinforcement Learning in Long-Context Compression

Li Wang, Yandong Wang, Xin Yu, Kui Zhang, Tianhao Peng, Wenjun Wu

PDF

Open Access

TL;DR

TAMTRL introduces a reward reshaping method that aligns teacher signals with each turn in multi-turn reinforcement learning, enhancing long-context document processing in large language models.

Contribution

It proposes a novel reward reshaping technique that provides fine-grained signals for memory updates, addressing temporal credit assignment in long-context reinforcement learning.

Findings

01

Consistently outperforms baselines across seven benchmarks.

02

Improves memory update quality in multi-turn long-context tasks.

03

Enhances long-context processing efficiency.

Abstract

The rapid progress of large language models (LLMs) has led to remarkable performance gains across a wide range of tasks. However, when handling long documents that exceed the model's context window limit, the entire context cannot be processed in a single pass, making chunk-wise processing necessary. This requires multiple turns to read different chunks and update memory. However, supervision is typically provided only by the final outcome, which makes it difficult to evaluate the quality of memory updates at each turn in the multi-turn training setting. This introduces a temporal credit assignment challenge. Existing approaches, such as LLM-as-a-judge or process reward models, incur substantial computational overhead and suffer from estimation noise. To better address the credit assignment problem in multi-turn memory training, we propose Teacher-Aligned Reward Reshaping for Multi-Turn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis