IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model   with Multimodal Capabilities

Bin Wang; Chunyu Xie; Dawei Leng; Yuhui Yin

arXiv:2408.12902·cs.AI·April 16, 2025

IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities

Bin Wang, Chunyu Xie, Dawei Leng, Yuhui Yin

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces the Inner-Adaptor Architecture (IAA), a novel method that enables frozen large language models to acquire multimodal capabilities by integrating multiple adaptors at different depths, achieving superior performance without degrading NLP skills.

Contribution

The paper proposes the Inner-Adaptor Architecture (IAA), a new structural approach that allows frozen language models to learn multimodal tasks effectively with small-scale data.

Findings

01

Outperforms previous state-of-the-art methods on vision-language benchmarks

02

Maintains NLP performance while gaining multimodal capabilities

03

Effective with small-scale datasets

Abstract

In the field of multimodal large language models (MLLMs), common methods typically involve unfreezing the language model during training to foster profound visual understanding. However, the fine-tuning of such models with vision-language data often leads to a diminution of their natural language processing (NLP) capabilities. To avoid this performance degradation, a straightforward solution is to freeze the language model while developing multimodal competencies. Unfortunately, previous works have not attained satisfactory outcomes. Building on the strategy of freezing the language model, we conduct thorough structural exploration and introduce the Inner-Adaptor Architecture (IAA). Specifically, the architecture incorporates multiple multimodal adaptors at varying depths within the large language model to facilitate direct interaction with the inherently text-oriented transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

360cvgroup/inner-adaptor-architecture
pytorchOfficial

Models

🤗
qihoo360/Inner-Adaptor-Architecture
model· 12 dl· ♡ 10
12 dl♡ 10

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Topic Modeling · Natural Language Processing Techniques