EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation

Yongxin Wang; Meng Cao; Haokun Lin; Mingfei Han; Liang Ma; Jin Jiang,; Yuhao Cheng; Xiaodan Liang

arXiv:2412.04903·cs.CV·December 17, 2024

EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation

Yongxin Wang, Meng Cao, Haokun Lin, Mingfei Han, Liang Ma, Jin Jiang,, Yuhao Cheng, Xiaodan Liang

PDF

Open Access

TL;DR

EACO enhances multimodal large language models by self-generating preference data through a critical observation framework, significantly reducing hallucinations and improving reasoning without relying on costly human labels.

Contribution

The paper introduces a novel self-supervised approach, EACO, that uses a critical evaluation model to improve MLLMs via preference data, avoiding dependence on high-quality human labels.

Findings

01

Reduces hallucinations by 65.6% on HallusionBench

02

Improves reasoning ability by 21.8% on MME-Cognition

03

Achieves 8.5% performance boost over LLaVA-v1.6-Mistral-7B

Abstract

Multimodal large language models (MLLMs) have achieved remarkable progress on various visual question answering and reasoning tasks leveraging instruction fine-tuning specific datasets. They can also learn from preference data annotated by human to enhance their reasoning ability and mitigate hallucinations. Most of preference data is generated from the model itself. However, existing methods require high-quality critical labels, which are costly and rely on human or proprietary models like GPT-4V. In this work, we propose Enhancing Alignment in MLLMs via Critical Observation (EACO), which aligns MLLMs by self-generated preference data using only 5k images economically. Our approach begins with collecting and refining a Scoring Evaluation Instruction-tuning dataset to train a critical evaluation model, termed the Critic. This Critic observes model responses across multiple dimensions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies