EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation
Yongxin Wang, Meng Cao, Haokun Lin, Mingfei Han, Liang Ma, Jin Jiang,, Yuhao Cheng, Xiaodan Liang

TL;DR
EACO enhances multimodal large language models by self-generating preference data through a critical observation framework, significantly reducing hallucinations and improving reasoning without relying on costly human labels.
Contribution
The paper introduces a novel self-supervised approach, EACO, that uses a critical evaluation model to improve MLLMs via preference data, avoiding dependence on high-quality human labels.
Findings
Reduces hallucinations by 65.6% on HallusionBench
Improves reasoning ability by 21.8% on MME-Cognition
Achieves 8.5% performance boost over LLaVA-v1.6-Mistral-7B
Abstract
Multimodal large language models (MLLMs) have achieved remarkable progress on various visual question answering and reasoning tasks leveraging instruction fine-tuning specific datasets. They can also learn from preference data annotated by human to enhance their reasoning ability and mitigate hallucinations. Most of preference data is generated from the model itself. However, existing methods require high-quality critical labels, which are costly and rely on human or proprietary models like GPT-4V. In this work, we propose Enhancing Alignment in MLLMs via Critical Observation (EACO), which aligns MLLMs by self-generated preference data using only 5k images economically. Our approach begins with collecting and refining a Scoring Evaluation Instruction-tuning dataset to train a critical evaluation model, termed the Critic. This Critic observes model responses across multiple dimensions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies
