Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model   Fine-tuning

Ziang Ye; Zhenru Zhang; Yang Zhang; Jianxin Ma; Junyang Lin; Fuli Feng

arXiv:2412.14780·cs.CL·December 20, 2024

Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning

Ziang Ye, Zhenru Zhang, Yang Zhang, Jianxin Ma, Junyang Lin, Fuli Feng

PDF

Open Access

TL;DR

This paper introduces a novel method for disentangling reasoning tokens from boilerplate tokens in language model fine-tuning, leading to improved performance by emphasizing reasoning tokens during training.

Contribution

The paper proposes SHAD, a shuffle-aware discriminator, and RFT, a fine-tuning method that adaptively emphasizes reasoning tokens, addressing token role differentiation in LLM training.

Findings

01

RFT outperforms standard supervised fine-tuning.

02

SHAD effectively classifies tokens based on predictability.

03

Enhanced reasoning capabilities in LLMs.

Abstract

When using agent-task datasets to enhance agent capabilities for Large Language Models (LLMs), current methodologies often treat all tokens within a sample equally. However, we argue that tokens serving different roles - specifically, reasoning tokens versus boilerplate tokens (e.g., those governing output format) - differ significantly in importance and learning complexity, necessitating their disentanglement and distinct treatment. To address this, we propose a novel Shuffle-Aware Discriminator (SHAD) for adaptive token discrimination. SHAD classifies tokens by exploiting predictability differences observed after shuffling input-output combinations across samples: boilerplate tokens, due to their repetitive nature among samples, maintain predictability, whereas reasoning tokens do not. Using SHAD, we propose the Reasoning-highlighted Fine-Tuning (RFT) method, which adaptively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling