Value-Based Pre-Training with Downstream Feedback
Shuqi Ke, Giulia Fanti

TL;DR
V-Pretraining is a novel value-based method that guides self-supervised pretraining using downstream feedback, significantly improving downstream task performance with minimal labeled data.
Contribution
The paper introduces V-Pretraining, a modality-agnostic approach that reshapes pretraining tasks based on downstream feedback without using downstream labels during model updates.
Findings
Improves reasoning accuracy on GSM8K by up to 18% with limited feedback.
Enhances vision SSL performance on ADE20K and NYUv2 datasets.
Reduces training data requirements for effective pretraining.
Abstract
Can a small amount of verified goal information steer the expensive self-supervised pretraining of foundation models? Standard pretraining optimizes a fixed proxy objective (e.g., next-token prediction), which can misallocate compute away from downstream capabilities of interest. We introduce V-Pretraining: a value-based, modality-agnostic method for controlled continued pretraining in which a lightweight task designer reshapes the pretraining task to maximize the value of each gradient step. For example, consider self-supervised learning (SSL) with sample augmentation. The V-Pretraining task designer selects pretraining tasks (e.g., augmentations) for which the pretraining loss gradient is aligned with a gradient computed over a downstream task (e.g., image segmentation). This helps steer pretraining towards relevant downstream capabilities. Notably, the pretrained model is never…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Natural Language Processing Techniques
