Multi-Rationale Explainable Object Recognition via Contrastive Conditional Inference
Ali Rasekh, Sepehr Kazemi Ranjbar, Simon Gottschalk

TL;DR
This paper introduces a new benchmark and a contrastive inference framework for explainable object recognition that handles multiple rationales, improving accuracy and interpretability without training.
Contribution
It proposes a multi-rationale dataset and a contrastive conditional inference method that models relationships among image features, labels, and rationales, achieving state-of-the-art results.
Findings
Achieved state-of-the-art performance on the benchmark.
Demonstrated strong zero-shot classification accuracy.
Produced high-quality rationales for explanations.
Abstract
Explainable object recognition using vision-language models such as CLIP involves predicting accurate category labels supported by rationales that justify the decision-making process. Existing methods typically rely on prompt-based conditioning, which suffers from limitations in CLIP's text encoder and provides weak conditioning on explanatory structures. Additionally, prior datasets are often restricted to single, and frequently noisy, rationales that fail to capture the full diversity of discriminative image features. In this work, we introduce a multi-rationale explainable object recognition benchmark comprising datasets in which each image is annotated with multiple ground-truth rationales, along with evaluation metrics designed to offer a more comprehensive representation of the task. To overcome the limitations of previous approaches, we propose a contrastive conditional inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
