MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

Changlu Guo; Anders Nymark Christensen; Anders Bjorholm Dahl; Morten Rieger Hannemose

arXiv:2602.18792·cs.CV·April 24, 2026

MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

Changlu Guo, Anders Nymark Christensen, Anders Bjorholm Dahl, Morten Rieger Hannemose

PDF

TL;DR

MaskDiME is a fast, training-free diffusion framework that generates precise, semantically consistent visual counterfactual explanations by focusing on decision-relevant regions, outperforming existing methods in speed and accuracy.

Contribution

It introduces a novel adaptive localized sampling approach that unifies semantic consistency and spatial precision without training, enabling efficient counterfactual generation.

Findings

01

Performs inference over 30x faster than baseline methods.

02

Achieves state-of-the-art or comparable performance across five benchmark datasets.

03

Effectively localizes modifications while maintaining high image fidelity.

Abstract

Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual generation methods are often computationally expensive, slow to sample, and imprecise in localizing the modified regions. To address these limitations, we propose MaskDiME, a simple, fast, yet effective diffusion framework that unifies semantic consistency and spatial precision through localized sampling. Our approach adaptively focuses on decision-relevant regions to achieve localized and semantically consistent counterfactual generation while preserving high image fidelity. Our training-free framework, MaskDiME, performs inference over 30x faster than the baseline and achieves comparable or state-of-the-art performance across five benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.