CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Wenqiao Zhu; Ji Liu; Rongjuncheng Zhang; Haipang Wu; Yulun Zhang

arXiv:2508.15868·cs.CL·September 9, 2025

CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Wenqiao Zhu, Ji Liu, Rongjuncheng Zhang, Haipang Wu, Yulun Zhang

PDF

Open Access

TL;DR

This paper introduces CARFT, a contrastive learning method with annotated Chain-of-Thought to improve LLM reasoning, addressing stability and performance issues in reinforcement fine-tuning.

Contribution

The paper proposes a novel contrastive learning approach that exploits annotated CoT and stabilizes fine-tuning, significantly enhancing LLM reasoning performance.

Findings

01

Achieves up to 10.15% performance improvement

02

Improves robustness and training stability

03

Enhances efficiency by up to 30.62%

Abstract

Reasoning capability plays a significantly critical role in the the broad applications of Large Language Models (LLMs). To enhance the reasoning performance of LLMs, diverse Reinforcement Learning (RL)-based fine-tuning approaches have been proposed to address the limited generalization capability of LLMs trained solely via Supervised Fine-Tuning (SFT). Despite their effectiveness, two major limitations hinder the advancement of LLMs. First, vanilla RL-based approaches ignore annotated Chain-of-Thought (CoT) and incorporate unstable reasoning path sampling, which typically results in model collapse, unstable training process, and suboptimal performance. Second, existing SFT approaches generally overemphasize the annotated CoT, potentially leading to performance degradation due to insufficient exploitation of potential CoT. In this paper, we propose a Contrastive learning with annotated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies