MLLM-DataEngine: An Iterative Refinement Approach for MLLM
Zhiyuan Zhao, Linke Ouyang, Bin Wang, Siyuan Huang, Pan Zhang, Xiaoyi, Dong, Jiaqi Wang, Conghui He

TL;DR
MLLM-DataEngine introduces an iterative, closed-loop system that enhances multimodal large language models by automatically generating targeted, high-quality data based on evaluation feedback, requiring minimal human intervention.
Contribution
The paper presents a novel closed-loop framework for MLLMs that integrates data generation, model training, and evaluation to improve capabilities efficiently.
Findings
Boosts MLLM performance through targeted data augmentation.
Uses GPT-4 for high-quality data generation with interactive prompt optimization.
Requires minimal human participation in the iterative process.
Abstract
Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the guidance of evaluation results with a relatively low human cost. In this paper, we propose MLLM-DataEngine, a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop iteration, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results, then generate a proper incremental dataset for the next training iteration and enhance the model capability iteratively. Compared with previous data collection methods which are separate from the benchmarking, the data generated by MLLM-DataEngine shows better targeting, quality, and correctness. For targeting, we propose an Adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Weight Decay · Linear Layer · Attention Dropout · Softmax · Dense Connections · Discriminative Fine-Tuning
