DocVCE: Diffusion-based Visual Counterfactual Explanations for Document Image Classification
Saifullah Saifullah, Stefan Agne, Andreas Dengel, Sheraz Ahmed

TL;DR
This paper introduces DocVCE, a diffusion-based method for generating visual counterfactual explanations to improve transparency in document image classification models, addressing interpretability challenges.
Contribution
It presents the first generative counterfactual explanation approach for document image analysis using diffusion models with hierarchical refinement.
Findings
Effective in generating plausible counterfactuals across multiple datasets and models
Outperforms existing feature-importance methods in interpretability
Provides insights into global features learned by classifiers
Abstract
As black-box AI-driven decision-making systems become increasingly widespread in modern document processing workflows, improving their transparency and reliability has become critical, especially in high-stakes applications where biases or spurious correlations in decision-making could lead to serious consequences. One vital component often found in such document processing workflows is document image classification, which, despite its widespread use, remains difficult to explain. While some recent works have attempted to explain the decisions of document image classification models through feature-importance maps, these maps are often difficult to interpret and fail to provide insights into the global features learned by the model. In this paper, we aim to bridge this research gap by introducing generative document counterfactuals that provide meaningful insights into the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
