VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision

Xuan Gong; Senmiao Wang; Hanbo Huang; Ruoyu Sun; Shiyu Liang

arXiv:2510.27462·cs.CL·April 21, 2026

VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision

Xuan Gong, Senmiao Wang, Hanbo Huang, Ruoyu Sun, Shiyu Liang

PDF

1 Repo 1 Datasets

TL;DR

VCORE introduces a variance-controlled reweighting framework for chain-of-thought supervision, improving reasoning performance of large language models by adaptively allocating supervision across tokens.

Contribution

It formulates CoT supervision as a constrained optimization problem, enabling principled and adaptive token-level supervision reweighting for better reasoning generalization.

Findings

01

VCORE achieves the strongest average performance across benchmarks.

02

Significant gains on mathematical and coding tasks with various models.

03

VCORE improves initialization for reinforcement learning in reasoning tasks.

Abstract

Supervised fine-tuning (SFT) on long chain-of-thought (CoT) trajectories has emerged as a crucial technique for enhancing the reasoning abilities of large language models (LLMs). However, the standard cross-entropy loss treats all tokens equally, ignoring their heterogeneous contributions across a reasoning trajectory. This uniform treatment leads to misallocated supervision and weak generalization, especially in complex, long-form reasoning tasks. To address this, we introduce \textbf{V}ariance-\textbf{C}ontrolled \textbf{O}ptimization-based \textbf{RE}weighting (VCORE), a principled framework that reformulates CoT supervision as a constrained optimization problem. By adopting an optimization-theoretic perspective, VCORE enables a principled and adaptive allocation of supervision across tokens, thereby aligning the training objective more closely with the goal of robust reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coder-gx/VCORE
github

Datasets

XanderGong/VCORE-data
dataset· 130 dl
130 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.