On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen,, Yu Cheng

TL;DR
This paper introduces a dynamic logit fusion method that enables effective transfer of knowledge from small task-specific models to larger models without additional training, improving performance and efficiency in fine-tuning large language models.
Contribution
It proposes a novel dynamic logit fusion approach that adaptively combines multiple small models for knowledge transfer, surpassing static methods and reducing training overhead.
Findings
Achieves 96.4% of full fine-tuning performance in single-task scenarios.
Closes 86.3% of the performance gap in multi-task settings.
Effectively integrates in-context learning and task arithmetic.
Abstract
Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. \thm{Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training?} In this paper, we explore weak-to-strong specialization using logit arithmetic, facilitating a direct answer to this question. Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge, which leads to suboptimal performance. % To address this, To surmount these limitations, we propose a dynamic logit fusion approach that works with a series of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications
