DEAL: Disentangle and Localize Concept-level Explanations for VLMs
Tang Li, Mengmeng Ma, and Xi Peng

TL;DR
This paper introduces DEAL, a method to disentangle and localize concept-level explanations in vision-language models, improving interpretability and reducing reliance on spurious correlations, thereby enhancing model accuracy.
Contribution
The paper proposes a novel approach to improve concept-level explanation disentanglement and localization in VLMs without human annotations, addressing entanglement issues.
Findings
Significantly better disentanglability and localizability of explanations.
Reduced reliance on spurious correlations.
Improved prediction accuracy.
Abstract
Large pre-trained Vision-Language Models (VLMs) have become ubiquitous foundational components of other models and downstream tasks. Although powerful, our empirical results reveal that such models might not be able to identify fine-grained concepts. Specifically, the explanations of VLMs with respect to fine-grained concepts are entangled and mislocalized. To address this issue, we propose to DisEntAngle and Localize (DEAL) the concept-level explanations for VLMs without human annotations. The key idea is encouraging the concept-level explanations to be distinct while maintaining consistency with category-level explanations. We conduct extensive experiments and ablation studies on a wide range of benchmark datasets and vision-language models. Our empirical results demonstrate that the proposed method significantly improves the concept-level explanations of the model in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAtrial Fibrillation Management and Outcomes
