DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation

Emre Kavak; Tom Nuno Wolf; Christian Wachinger

arXiv:2506.11653·cs.CV·September 23, 2025

DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation

Emre Kavak, Tom Nuno Wolf, Christian Wachinger

PDF

Open Access 3 Reviews

TL;DR

This paper introduces DISCO methods that leverage causal theory to effectively mitigate dataset bias in deep learning models, improving robustness and scalability across diverse datasets.

Contribution

It presents the SAM causal framework and scalable estimators DISCO$_m$ and sDISCO for bias mitigation, bridging causal theory and deep learning practice.

Findings

01

DISCO methods outperform existing bias mitigation techniques.

02

The methods require fewer hyperparameters.

03

They scale effectively to multi-bias scenarios.

Abstract

Dataset bias often leads deep learning models to exploit spurious correlations instead of task-relevant signals. We introduce the Standard Anti-Causal Model (SAM), a unifying causal framework that characterizes bias mechanisms and yields a conditional independence criterion for causal stability. Building on this theory, we propose DISCO $_{m}$ and sDISCO, efficient and scalable estimators of conditional distance correlation that enable independence regularization in black-box models. Across five diverse datasets, our methods consistently outperform or are competitive in existing bias mitigation approaches, while requiring fewer hyperparameters and scaling seamlessly to multi-bias scenarios. This work bridges causal theory and practical deep learning, providing both a principled foundation and effective tools for robust prediction. Source Code: https://github.com/***.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 3

Strengths

- A cleanly motivated debiasing technique that focuses on direct effects. - The distance-covariance measure of independent was new to me and is interesting given it doesn't need require estimation. - Good experimental results, showing improvements compared to standard baselines.

Weaknesses

See questions.

Reviewer 02Rating 4Confidence 5

Strengths

- **Principled independence criterion:** The independence constraint ``Ŷ ⟂ B | Y`` is derived clearly from causal reasoning. Intuitively, once the true label is known, the prediction should not depend on bias variables. - **Practical regularization approach:** Using conditional distance correlation as a differentiable regularizer is elegant and theoretically grounded. It avoids adversarial training or explicit causal graphs. The proposed estimators (DISCOm and sDISCO) are computationall

Weaknesses

- **Limited practical significance:** The method assumes that bias variables are **known** during training. In realistic scenarios, biases are often unknown, making this assumption impractical. While previous methods like GDRO and Last Layer Retraining (LLR) (or Deep Feature Reweighting) use group annotations, the field needs methods robust to unknown biases. Furthermore, the empirical advantage over existing methods is unclear: for example, LLR achieves higher worst-group accuracy o

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper formulates the debiasing goal through a clear causal formulation $\hat{Y}\perp B | Y$, and tranforms it into an optimization problem of minimizing conditional distance correlation, which is novel and well-motivated formulation. 2. The paper introduces two novel, and practical estimators $DISCO_m$ and $sDISCO$, that make conditional independence regularization computationally feasible. 3. The method is principled and theoretically grounded. 4. The paper presents a moderately nove

Weaknesses

1. Minor inconsistencies and potential overstatements: (a) According to the definition of $\text{ctf-SE}$, and the derivation in Appendix C.1, should it be "$+\text{ctf-SE}$" in Equation 6 rather than "$-\text{ctf-SE}$"? The current sign seems inconsistent to me. (b) Line 133- 134 claim that classical maximum likelihood estimation aims to maximize TV. Is there a theoretical justification, proof, or citation to support this claim? 2. The paper wants to enforce independence of the model's predi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications