TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation

Chengrui Huang; Shen Gao; Zhengliang Shi; Dongsheng Wang; Shuo Shang

arXiv:2505.20016·cs.CL·May 27, 2025

TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation

Chengrui Huang, Shen Gao, Zhengliang Shi, Dongsheng Wang, Shuo Shang

PDF

Open Access

TL;DR

TTPA is a novel training framework that aligns large language models with fine-grained tool-use preferences by utilizing token-level data, reversed dataset construction, and an error-oriented scoring mechanism, leading to improved tool-using performance.

Contribution

The paper introduces TTPA, a new training paradigm with a token-level preference dataset, reversed data construction, and an error-based scoring mechanism for better tool-use alignment.

Findings

01

Significantly improves tool-using performance across datasets

02

Demonstrates strong generalization across models and datasets

03

Enhances fine-grained preference alignment in LLMs

Abstract

Existing tool-learning methods usually rely on supervised fine-tuning, they often overlook fine-grained optimization of internal tool call details, leading to limitations in preference alignment and error discrimination. To overcome these challenges, we propose Token-level Tool-use Preference Alignment Training Framework (TTPA), a training paradigm for constructing token-level tool-use preference datasets that align LLMs with fine-grained preferences using a novel error-oriented scoring mechanism. TTPA first introduces reversed dataset construction, a method for creating high-quality, multi-turn tool-use datasets by reversing the generation flow. Additionally, we propose Token-level Preference Sampling (TPS) to capture fine-grained preferences by modeling token-level differences during generation. To address biases in scoring, we introduce the Error-oriented Scoring Mechanism (ESM),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Manufacturing Process and Optimization · Semantic Web and Ontologies

MethodsALIGN