Towards Robust Instruction Tuning on Multimodal Large Language Models

Wei Han; Hui Chen; Soujanya Poria

arXiv:2402.14492·cs.CL·June 17, 2024·1 cites

Towards Robust Instruction Tuning on Multimodal Large Language Models

Wei Han, Hui Chen, Soujanya Poria

PDF

Open Access 1 Repo

TL;DR

This paper introduces INSTRAUG, an automatic instruction augmentation method for multimodal large language models, significantly expanding instruction datasets and improving model alignment across multiple tasks without extensive human effort.

Contribution

The paper presents INSTRAUG, a novel automatic data augmentation technique that expands instruction datasets by 30 times for multimodal tasks, enhancing model performance.

Findings

01

INSTRAUG increases dataset size by 30 times.

02

Improves model alignment across 12 multimodal tasks.

03

Achieves benefits comparable to large-scale data scaling.

Abstract

Fine-tuning large language models (LLMs) on multi-task instruction-following data has been proven to be a powerful learning paradigm for improving their zero-shot capabilities on new tasks. Recent works about high-quality instruction-following data generation and selection require amounts of human labor to conceive model-understandable instructions for the given tasks and carefully filter the LLM-generated data. In this work, we introduce an automatic instruction augmentation method named INSTRAUG in multimodal tasks. It starts from a handful of basic and straightforward meta instructions but can expand an instruction-following dataset by 30 times. Results on two popular multimodal instructionfollowing benchmarks MULTIINSTRUCT and InstructBLIP show that INSTRAUG can significantly improve the alignment of multimodal large language models (MLLMs) across 12 multimodal tasks, which is even…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

declare-lab/instraug
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media · Speech and dialogue systems