Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim, Gusang Lee, Kyuhong Shim, and Byonghyo Shim

TL;DR
This paper analyzes parameter-efficient fine-tuning methods for large multi-modal models, highlighting how prefix-tuning preserves pre-trained representations better than other methods and proposing a combined two-step strategy that improves downstream task performance.
Contribution
It introduces PT-PEFT, a two-step fine-tuning approach that combines prefix-tuning with other PEFT methods to enhance performance and preserve pre-trained representations in large multi-modal models.
Findings
Prefix-tuning better preserves pre-trained feature space.
LoRA and Adapters distort learned representations.
PT-PEFT improves downstream task performance.
Abstract
Recently, we have observed that Large Multi-modal Models (LMMs) are revolutionizing the way machines interact with the world, unlocking new possibilities across various multi-modal applications. To adapt LMMs for downstream tasks, parameter-efficient fine-tuning (PEFT) which only trains additional prefix tokens or modules, has gained popularity. Nevertheless, there has been little analysis of how PEFT works in LMMs. In this paper, we delve into the strengths and weaknesses of each tuning strategy, shifting the focus from the efficiency typically associated with these approaches. We first discover that model parameter tuning methods such as LoRA and Adapters distort the feature representation space learned during pre-training and limit the full utilization of pre-trained knowledge. We also demonstrate that prefix-tuning excels at preserving the representation space, despite its lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
