MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching
Changle Qu, Sunhao Dai, Hengyi Cai, Jun Xu, Shuaiqiang Wang, Dawei Yin

TL;DR
MatchTIR introduces a bipartite matching-based framework for fine-grained, turn-level supervision in tool-integrated reasoning, significantly improving large language models' performance on complex, multi-turn tasks.
Contribution
The paper presents MatchTIR, a novel method that applies bipartite matching for turn-level reward assignment and dual-level advantage estimation, enhancing tool use supervision in LLMs.
Findings
Outperforms existing methods on three benchmarks.
A 4B model surpasses most 8B models in long-horizon tasks.
Effective in multi-turn reasoning scenarios.
Abstract
Tool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all steps within a trajectory. This coarse-grained credit assignment fails to distinguish effective tool calls from redundant or erroneous ones, particularly in long-horizon multi-turn scenarios. To address this, we propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation. Specifically, we formulate credit assignment as a bipartite matching problem between predicted and ground-truth traces, utilizing two assignment strategies to derive dense turn-level rewards. Furthermore, to balance local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
