Transferable Post-training via Inverse Value Learning
Xinyu Lu, Xueru Wen, Yaojie Lu, Bowen Yu, Hongyu Lin, Haiyang Yu, Le Sun, Xianpei Han, Yongbin Li

TL;DR
This paper introduces a novel post-training method using a value network to adapt pre-trained models efficiently, achieving comparable performance to full fine-tuning with reduced computational costs.
Contribution
It proposes a transferable value network for post-training that can be integrated with various models, improving efficiency and transferability across different model sizes and vocabularies.
Findings
Value network achieves broad transferability across models.
Comparable performance to full fine-tuning in some cases.
Enhancement techniques improve transferability and prevent overfitting.
Abstract
As post-training processes utilize increasingly large datasets and base models continue to grow in size, the computational demands and implementation challenges of existing algorithms are escalating significantly. In this paper, we propose modeling the changes at the logits level during post-training using a separate neural network (i.e., the value network). After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference, enables them to achieve similar capability enhancements. We systematically investigate the best practices for this paradigm in terms of pre-training weights and connection schemes. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes within the same family, models undergoing continuous pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Resource Development and Performance Evaluation · AI and HR Technologies · Machine Learning and ELM
MethodsBalanced Selection
