EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models
Andr\'es Villa, Juan Le\'on Alc\'azar, Motasem Alfarra, Vladimir, Araujo, Alvaro Soto, Bernard Ghanem

TL;DR
EAGLE enhances the visual component of multimodal models through a simple reformulation of contrastive pre-training, significantly reducing hallucinations and improving grounding without additional instruction training.
Contribution
The paper introduces EAGLE, a post-pretraining method that improves visual grounding and reduces hallucinations in multimodal models by reformulating contrastive pre-training.
Findings
EAGLE reduces hallucinations across multiple benchmarks.
The method improves visual grounding without extra instruction training.
EAGLE is agnostic to the language model and fusion module.
Abstract
Large language models and vision transformers have demonstrated impressive zero-shot capabilities, enabling significant transferability in downstream tasks. The fusion of these models has resulted in multi-modal architectures with enhanced instructional capabilities. Despite incorporating vast image and language pre-training, these multi-modal architectures often generate responses that deviate from the ground truth in the image data. These failure cases are known as hallucinations. Current methods for mitigating hallucinations generally focus on regularizing the language component, improving the fusion module, or ensembling multiple visual encoders to improve visual representation. In this paper, we address the hallucination issue by directly enhancing the capabilities of the visual component. Our approach, named EAGLE, is fully agnostic to the LLM or fusion module and works as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics
MethodsFocus
