Portable Reward Tuning: Towards Reusable Fine-Tuning across Different   Pretrained Models

Daiki Chijiwa; Taku Hasegawa; Kyosuke Nishida; Kuniko Saito; Susumu; Takeuchi

arXiv:2502.12776·cs.LG·February 19, 2025

Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models

Daiki Chijiwa, Taku Hasegawa, Kyosuke Nishida, Kuniko Saito, Susumu, Takeuchi

PDF

Open Access 1 Datasets

TL;DR

This paper introduces Portable Reward Tuning (PRT), a novel fine-tuning approach that trains a reward model to enable reusable fine-tuning across different foundation models, reducing inference overhead while maintaining accuracy.

Contribution

PRT reformulates fine-tuning as reward maximization, allowing a single reward model to be used with various foundation models without additional inference overhead.

Findings

01

PRT achieves comparable accuracy to inference-time tuning methods.

02

PRT reduces inference cost compared to traditional fine-tuning.

03

Effective across both vision and language models.

Abstract

While foundation models have been exploited for various expert tasks through fine-tuning, any foundation model will become outdated due to its old knowledge or limited capability. Thus the underlying foundation model should be eventually replaced by new ones, which leads to repeated cost of fine-tuning these new models. Existing work addresses this problem by inference-time tuning, i.e., modifying the output probabilities from the new foundation model with the outputs from the old foundation model and its fine-tuned model, which involves an additional overhead in inference by the latter two models. In this paper, we propose a new fine-tuning principle, Portable Reward Tuning (PRT), that reduces the inference overhead by its nature, based on the reformulation of fine-tuning as the reward maximization. Specifically, instead of fine-tuning parameters of the foundation models, PRT trains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

rosssso/prt14-codebase
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Model Reduction and Neural Networks