A Probabilistic Consensus-Driven Approach for Robust Counterfactual Explanations

Marcin Kostrzewa; Maciej Zi\k{e}ba; Jerzy Stefanowski

arXiv:2604.17494·cs.LG·April 21, 2026

A Probabilistic Consensus-Driven Approach for Robust Counterfactual Explanations

Marcin Kostrzewa, Maciej Zi\k{e}ba, Jerzy Stefanowski

PDF

TL;DR

This paper introduces a probabilistic, consensus-driven method for generating robust counterfactual explanations that remain valid under slight model changes, using a normalizing flow trained on model ensembles.

Contribution

It presents a novel approach that jointly models data distribution and model decision space to produce stable, robust CFEs without extensive tuning or model-specific adjustments.

Findings

01

Achieves superior empirical robustness compared to existing methods.

02

Maintains high plausibility and stability of CFEs across model variations.

03

Uses a single parameter to control robustness level at inference.

Abstract

Counterfactual explanations (CFEs) are essential for interpreting black-box models, yet they often become invalid when models are slightly changed. Existing methods for generating robust CFEs are often limited to specific types of models, require costly tuning, or inflexible robustness controls. We propose a novel approach that jointly models the data distribution and the space of plausible model decisions to ensure robustness to model changes. Using a probabilistic consensus over a model ensemble, we train a conditional normalizing flow that captures the data density under varying levels of classifier agreement. At inference time, a single interpretable parameter controls the robustness level; it specifies the minimum fraction of models that should agree on the target class without retraining the generative model. Our method effectively pushes CFEs toward regions that are both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.