MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations
Changlu Guo, Anders Nymark Christensen, Anders Bjorholm Dahl, Morten Rieger Hannemose

TL;DR
MaskDiME is a fast, training-free diffusion framework that generates precise, semantically consistent visual counterfactual explanations by focusing on decision-relevant regions, outperforming existing methods in speed and accuracy.
Contribution
It introduces a novel adaptive localized sampling approach that unifies semantic consistency and spatial precision without training, enabling efficient counterfactual generation.
Findings
Performs inference over 30x faster than baseline methods.
Achieves state-of-the-art or comparable performance across five benchmark datasets.
Effectively localizes modifications while maintaining high image fidelity.
Abstract
Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual generation methods are often computationally expensive, slow to sample, and imprecise in localizing the modified regions. To address these limitations, we propose MaskDiME, a simple, fast, yet effective diffusion framework that unifies semantic consistency and spatial precision through localized sampling. Our approach adaptively focuses on decision-relevant regions to achieve localized and semantically consistent counterfactual generation while preserving high image fidelity. Our training-free framework, MaskDiME, performs inference over 30x faster than the baseline and achieves comparable or state-of-the-art performance across five benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
