TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation
Chengrui Huang, Shen Gao, Zhengliang Shi, Dongsheng Wang, Shuo Shang

TL;DR
TTPA is a novel training framework that aligns large language models with fine-grained tool-use preferences by utilizing token-level data, reversed dataset construction, and an error-oriented scoring mechanism, leading to improved tool-using performance.
Contribution
The paper introduces TTPA, a new training paradigm with a token-level preference dataset, reversed data construction, and an error-based scoring mechanism for better tool-use alignment.
Findings
Significantly improves tool-using performance across datasets
Demonstrates strong generalization across models and datasets
Enhances fine-grained preference alignment in LLMs
Abstract
Existing tool-learning methods usually rely on supervised fine-tuning, they often overlook fine-grained optimization of internal tool call details, leading to limitations in preference alignment and error discrimination. To overcome these challenges, we propose Token-level Tool-use Preference Alignment Training Framework (TTPA), a training paradigm for constructing token-level tool-use preference datasets that align LLMs with fine-grained preferences using a novel error-oriented scoring mechanism. TTPA first introduces reversed dataset construction, a method for creating high-quality, multi-turn tool-use datasets by reversing the generation flow. Additionally, we propose Token-level Preference Sampling (TPS) to capture fine-grained preferences by modeling token-level differences during generation. To address biases in scoring, we introduce the Error-oriented Scoring Mechanism (ESM),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Manufacturing Process and Optimization · Semantic Web and Ontologies
MethodsALIGN
