MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

Changle Qu; Sunhao Dai; Hengyi Cai; Jun Xu; Shuaiqiang Wang; Dawei Yin

arXiv:2601.10712·cs.CL·January 16, 2026

MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

Changle Qu, Sunhao Dai, Hengyi Cai, Jun Xu, Shuaiqiang Wang, Dawei Yin

PDF

Open Access 4 Models 1 Datasets

TL;DR

MatchTIR introduces a bipartite matching-based framework for fine-grained, turn-level supervision in tool-integrated reasoning, significantly improving large language models' performance on complex, multi-turn tasks.

Contribution

The paper presents MatchTIR, a novel method that applies bipartite matching for turn-level reward assignment and dual-level advantage estimation, enhancing tool use supervision in LLMs.

Findings

01

Outperforms existing methods on three benchmarks.

02

A 4B model surpasses most 8B models in long-horizon tasks.

03

Effective in multi-turn reasoning scenarios.

Abstract

Tool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all steps within a trajectory. This coarse-grained credit assignment fails to distinguish effective tool calls from redundant or erroneous ones, particularly in long-horizon multi-turn scenarios. To address this, we propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation. Specifically, we formulate credit assignment as a bipartite matching problem between predicted and ground-truth traces, utilizing two assignment strategies to derive dense turn-level rewards. Furthermore, to balance local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

ChangleQu/MatchTIR
dataset· 53 dl
53 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques