LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,   Framework, and Benchmark

Zhenfei Yin; Jiong Wang; Jianjian Cao; Zhelun Shi; Dingning Liu; Mukai; Li; Lu Sheng; Lei Bai; Xiaoshui Huang; Zhiyong Wang; Jing Shao; Wanli Ouyang

arXiv:2306.06687·cs.CV·November 7, 2023·40 cites

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

Zhenfei Yin, Jiong Wang, Jianjian Cao, Zhelun Shi, Dingning Liu, Mukai, Li, Lu Sheng, Lei Bai, Xiaoshui Huang, Zhiyong Wang, Jing Shao, Wanli Ouyang

PDF

Open Access 2 Repos 1 Models 1 Video

TL;DR

LAMM introduces an open-source multi-modal instruction tuning dataset, framework, and benchmark to advance research in vision-language models, supporting diverse vision tasks and facilitating human-AI interaction.

Contribution

It provides a comprehensive dataset, benchmark, and training framework for multi-modal large language models, enabling scalable research across various modalities and tasks.

Findings

01

Validated effectiveness of the dataset and benchmark through extensive experiments.

02

Baseline models trained efficiently within 24 A100 GPU hours.

03

Framework supports multiple hardware configurations for broader accessibility.

Abstract

Large language models have emerged as a promising approach towards achieving general-purpose AI agents. The thriving open-source LLM community has greatly accelerated the development of agents that support human-machine dialogue interaction through natural language processing. However, human interaction with the world extends beyond only text as a modality, and other modalities such as vision are also crucial. Recent works on multi-modal large language models, such as GPT-4V and Bard, have demonstrated their effectiveness in handling visual modalities. However, the transparency of these works is limited and insufficient to support academic research. To the best of our knowledge, we present one of the very first open-source endeavors in the field, LAMM, encompassing a Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark. Our aim is to establish LAMM as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
openlamm/lamm186k_llama2chat7b_lora32
model· 2 dl
2 dl

Videos

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsFocus