IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang, Chunyu Xie, Dawei Leng, Yuhui Yin

TL;DR
This paper introduces the Inner-Adaptor Architecture (IAA), a novel method that enables frozen large language models to acquire multimodal capabilities by integrating multiple adaptors at different depths, achieving superior performance without degrading NLP skills.
Contribution
The paper proposes the Inner-Adaptor Architecture (IAA), a new structural approach that allows frozen language models to learn multimodal tasks effectively with small-scale data.
Findings
Outperforms previous state-of-the-art methods on vision-language benchmarks
Maintains NLP performance while gaining multimodal capabilities
Effective with small-scale datasets
Abstract
In the field of multimodal large language models (MLLMs), common methods typically involve unfreezing the language model during training to foster profound visual understanding. However, the fine-tuning of such models with vision-language data often leads to a diminution of their natural language processing (NLP) capabilities. To avoid this performance degradation, a straightforward solution is to freeze the language model while developing multimodal competencies. Unfortunately, previous works have not attained satisfactory outcomes. Building on the strategy of freezing the language model, we conduct thorough structural exploration and introduce the Inner-Adaptor Architecture (IAA). Specifically, the architecture incorporates multiple multimodal adaptors at varying depths within the large language model to facilitate direct interaction with the inherently text-oriented transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Topic Modeling · Natural Language Processing Techniques
