Visual Disentangled Diffusion Autoencoders: Scalable Counterfactual Generation for Foundation Models
Sidney Bender, Marco Morik

TL;DR
This paper introduces DiDAE, a framework that efficiently generates diverse, disentangled counterfactuals for foundation models, improving robustness against spurious correlations without requiring group labels or expensive optimization.
Contribution
DiDAE combines frozen foundation models with disentangled dictionary learning for fast, gradient-free counterfactual generation, advancing robustness in foundation models.
Findings
DiDAE produces diverse, disentangled counterfactuals faster than existing methods.
DiDAE-CFKD achieves state-of-the-art mitigation of shortcut learning.
Improves downstream performance on unbalanced datasets.
Abstract
Foundation models, despite their robust zero-shot capabilities, remain vulnerable to spurious correlations and 'Clever Hans' strategies. Existing mitigation methods often rely on unavailable group labels or computationally expensive gradient-based adversarial optimization. To address these limitations, we propose Visual Disentangled Diffusion Autoencoders (DiDAE), a novel framework integrating frozen foundation models with disentangled dictionary learning for efficient, gradient-free counterfactual generation directly for the foundation model. DiDAE first edits foundation model embeddings in interpretable disentangled directions of the disentangled dictionary and then decodes them via a diffusion autoencoder. This allows the generation of multiple diverse, disentangled counterfactuals for each factual, much faster than existing baselines, which generate single entangled counterfactuals.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
