R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling

Aijia Cheng; Kailong Wang; Ling Shi; Yongxin Zhao

arXiv:2604.20316·cs.LG·April 23, 2026

R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling

Aijia Cheng, Kailong Wang, Ling Shi, Yongxin Zhao

PDF

TL;DR

This paper introduces R2IF, a reasoning-aware reinforcement learning framework that improves the alignment between reasoning processes and tool-call decisions in large language models, enhancing interpretability and accuracy.

Contribution

R2IF is a novel composite reward framework that aligns reasoning with decisions, improving interpretability and performance in LLM function calling tasks.

Findings

01

R2IF outperforms baselines by up to 34.62% on BFCL.

02

Achieves positive Average CoT Effectiveness (0.05) for Llama3.2-3B.

03

Enhances both function-calling accuracy and interpretability.

Abstract

Function calling empowers large language models (LLMs) to interface with external tools, yet existing RL-based approaches suffer from misalignment between reasoning processes and tool-call decisions. We propose R2IF, a reasoning-aware RL framework for interpretable function calling, adopting a composite reward integrating format/correctness constraints, Chain-of-Thought Effectiveness Reward (CER), and Specification-Modification-Value (SMV) reward, optimized via GRPO. Experiments on BFCL/ACEBench show R2IF outperforms baselines by up to 34.62% (Llama3.2-3B on BFCL) with positive Average CoT Effectiveness (0.05 for Llama3.2-3B), enhancing both function-calling accuracy and interpretability for reliable tool-augmented LLM deployment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.