MEGL: Multimodal Explanation-Guided Learning
Yifei Zhang, Tianxu Jiang, Bo Pan, Jingyu Wang, Guangji Bai, Liang, Zhao

TL;DR
MEGL introduces a multimodal explanation-guided learning framework that combines visual and textual explanations to improve AI interpretability and classification accuracy, especially when visual annotations are incomplete.
Contribution
The paper proposes a novel MEGL framework that integrates visual and textual explanations, including a saliency-driven grounding method and a distribution consistency loss, advancing multimodal interpretability.
Findings
MEGL outperforms previous methods in accuracy and explanation quality.
The approach effectively learns from incomplete multimodal supervision.
Experimental validation on Object-ME and Action-ME datasets confirms its effectiveness.
Abstract
Explaining the decision-making processes of Artificial Intelligence (AI) models is crucial for addressing their "black box" nature, particularly in tasks like image classification. Traditional eXplainable AI (XAI) methods typically rely on unimodal explanations, either visual or textual, each with inherent limitations. Visual explanations highlight key regions but often lack rationale, while textual explanations provide context without spatial grounding. Further, both explanation types can be inconsistent or incomplete, limiting their reliability. To address these challenges, we propose a novel Multimodal Explanation-Guided Learning (MEGL) framework that leverages both visual and textual explanations to enhance model interpretability and improve classification performance. Our Saliency-Driven Textual Grounding (SDTG) approach integrates spatial information from visual explanations into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies · Natural Language Processing Techniques
MethodsALIGN
