Transferable Post-training via Inverse Value Learning

Xinyu Lu; Xueru Wen; Yaojie Lu; Bowen Yu; Hongyu Lin; Haiyang Yu; Le Sun; Xianpei Han; Yongbin Li

arXiv:2410.21027·cs.LG·June 16, 2025

Transferable Post-training via Inverse Value Learning

Xinyu Lu, Xueru Wen, Yaojie Lu, Bowen Yu, Hongyu Lin, Haiyang Yu, Le Sun, Xianpei Han, Yongbin Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel post-training method using a value network to adapt pre-trained models efficiently, achieving comparable performance to full fine-tuning with reduced computational costs.

Contribution

It proposes a transferable value network for post-training that can be integrated with various models, improving efficiency and transferability across different model sizes and vocabularies.

Findings

01

Value network achieves broad transferability across models.

02

Comparable performance to full fine-tuning in some cases.

03

Enhancement techniques improve transferability and prevent overfitting.

Abstract

As post-training processes utilize increasingly large datasets and base models continue to grow in size, the computational demands and implementation challenges of existing algorithms are escalating significantly. In this paper, we propose modeling the changes at the logits level during post-training using a separate neural network (i.e., the value network). After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference, enables them to achieve similar capability enhancements. We systematically investigate the best practices for this paradigm in terms of pre-training weights and connection schemes. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes within the same family, models undergoing continuous pre-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luxinyu1/inverse-value-learning
pytorchOfficial

Videos

Transferable Post-training via Inverse Value Learning· underline

Taxonomy

TopicsHuman Resource Development and Performance Evaluation · AI and HR Technologies · Machine Learning and ELM

MethodsBalanced Selection