Value-Based Pre-Training with Downstream Feedback

Shuqi Ke; Giulia Fanti

arXiv:2601.22108·cs.LG·January 30, 2026

Value-Based Pre-Training with Downstream Feedback

Shuqi Ke, Giulia Fanti

PDF

Open Access

TL;DR

V-Pretraining is a novel value-based method that guides self-supervised pretraining using downstream feedback, significantly improving downstream task performance with minimal labeled data.

Contribution

The paper introduces V-Pretraining, a modality-agnostic approach that reshapes pretraining tasks based on downstream feedback without using downstream labels during model updates.

Findings

01

Improves reasoning accuracy on GSM8K by up to 18% with limited feedback.

02

Enhances vision SSL performance on ADE20K and NYUv2 datasets.

03

Reduces training data requirements for effective pretraining.

Abstract

Can a small amount of verified goal information steer the expensive self-supervised pretraining of foundation models? Standard pretraining optimizes a fixed proxy objective (e.g., next-token prediction), which can misallocate compute away from downstream capabilities of interest. We introduce V-Pretraining: a value-based, modality-agnostic method for controlled continued pretraining in which a lightweight task designer reshapes the pretraining task to maximize the value of each gradient step. For example, consider self-supervised learning (SSL) with sample augmentation. The V-Pretraining task designer selects pretraining tasks (e.g., augmentations) for which the pretraining loss gradient is aligned with a gradient computed over a downstream task (e.g., image segmentation). This helps steer pretraining towards relevant downstream capabilities. Notably, the pretrained model is never…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Natural Language Processing Techniques