MLLM-DataEngine: An Iterative Refinement Approach for MLLM

Zhiyuan Zhao; Linke Ouyang; Bin Wang; Siyuan Huang; Pan Zhang; Xiaoyi; Dong; Jiaqi Wang; Conghui He

arXiv:2308.13566·cs.LG·September 12, 2023

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

Zhiyuan Zhao, Linke Ouyang, Bin Wang, Siyuan Huang, Pan Zhang, Xiaoyi, Dong, Jiaqi Wang, Conghui He

PDF

Open Access 1 Repo

TL;DR

MLLM-DataEngine introduces an iterative, closed-loop system that enhances multimodal large language models by automatically generating targeted, high-quality data based on evaluation feedback, requiring minimal human intervention.

Contribution

The paper presents a novel closed-loop framework for MLLMs that integrates data generation, model training, and evaluation to improve capabilities efficiently.

Findings

01

Boosts MLLM performance through targeted data augmentation.

02

Uses GPT-4 for high-quality data generation with interactive prompt optimization.

03

Requires minimal human participation in the iterative process.

Abstract

Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the guidance of evaluation results with a relatively low human cost. In this paper, we propose MLLM-DataEngine, a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop iteration, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results, then generate a proper incremental dataset for the next training iteration and enhance the model capability iteratively. Compared with previous data collection methods which are separate from the benchmarking, the data generated by MLLM-DataEngine shows better targeting, quality, and correctness. For targeting, we propose an Adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opendatalab/mllm-dataengine
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Weight Decay · Linear Layer · Attention Dropout · Softmax · Dense Connections · Discriminative Fine-Tuning