ToolRLA: Multiplicative Reward Decomposition for Tool-Integrated Agents

Pengbo Liu

arXiv:2603.01620·cs.AI·March 12, 2026

ToolRLA: Multiplicative Reward Decomposition for Tool-Integrated Agents

Pengbo Liu

PDF

Open Access

TL;DR

ToolRLA introduces a multiplicative reward decomposition for tool-integrated agents, significantly improving task completion, reducing errors, and ensuring regulatory compliance in domain-specific applications.

Contribution

It presents a novel three-stage training pipeline with a fine-grained, multiplicative reward function that encodes multiple correctness dimensions for tool agents.

Findings

01

47% increase in task completion rate

02

63% reduction in tool invocation errors

03

93% reduction in regulatory violations

Abstract

Tool-integrated agents that interleave reasoning with API calls are promising for complex tasks, yet aligning them for high-stakes, domain-specific deployment remains challenging: existing reinforcement learning approaches rely on coarse binary rewards that cannot distinguish tool selection errors from malformed parameters. We present ToolRLA, a three-stage post-training pipeline (SFT -> GRPO -> DPO) for domain-specific tool agents. The core contribution is a fine-grained reward function with multiplicative correctness decomposition spanning four dimensions -- format validity, tool selection, parameter accuracy, and regulatory compliance -- that encodes domain priority orderings as inductive biases in the reward landscape. Deployed on a financial advisory copilot (80+ advisors, 1,200+ daily queries), ToolRLA achieves over three months: a 47% improvement in task completion rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Mobile Crowdsensing and Crowdsourcing