MEGL: Multimodal Explanation-Guided Learning

Yifei Zhang; Tianxu Jiang; Bo Pan; Jingyu Wang; Guangji Bai; Liang; Zhao

arXiv:2411.13053·cs.CV·November 21, 2024

MEGL: Multimodal Explanation-Guided Learning

Yifei Zhang, Tianxu Jiang, Bo Pan, Jingyu Wang, Guangji Bai, Liang, Zhao

PDF

Open Access

TL;DR

MEGL introduces a multimodal explanation-guided learning framework that combines visual and textual explanations to improve AI interpretability and classification accuracy, especially when visual annotations are incomplete.

Contribution

The paper proposes a novel MEGL framework that integrates visual and textual explanations, including a saliency-driven grounding method and a distribution consistency loss, advancing multimodal interpretability.

Findings

01

MEGL outperforms previous methods in accuracy and explanation quality.

02

The approach effectively learns from incomplete multimodal supervision.

03

Experimental validation on Object-ME and Action-ME datasets confirms its effectiveness.

Abstract

Explaining the decision-making processes of Artificial Intelligence (AI) models is crucial for addressing their "black box" nature, particularly in tasks like image classification. Traditional eXplainable AI (XAI) methods typically rely on unimodal explanations, either visual or textual, each with inherent limitations. Visual explanations highlight key regions but often lack rationale, while textual explanations provide context without spatial grounding. Further, both explanation types can be inconsistent or incomplete, limiting their reliability. To address these challenges, we propose a novel Multimodal Explanation-Guided Learning (MEGL) framework that leverages both visual and textual explanations to enhance model interpretability and improve classification performance. Our Saliency-Driven Textual Grounding (SDTG) approach integrates spatial information from visual explanations into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Semantic Web and Ontologies · Natural Language Processing Techniques

MethodsALIGN