Supervised Contrastive Block Disentanglement
Taro Makino, Ji Won Park, Natasa Tagasovska, Takamasa Kudo, Paula, Coelho, Jan-Christian Huetter, Heming Yao, Burkhard Hoeckendorf, Ana Carolina, Leote, Stephen Ra, David Richmond, Kyunghyun Cho, Aviv Regev, Romain Lopez

TL;DR
This paper introduces SCBD, a supervised contrastive learning algorithm that disentangles phenomena of interest from spurious correlations, improving domain generalization and batch correction in complex real-world datasets.
Contribution
The paper proposes a novel SCBD algorithm that enforces invariance to spurious correlations using supervised contrastive learning, applicable to real-world data for the first time.
Findings
SCBD improves out-of-distribution performance in domain generalization tasks.
SCBD effectively removes batch effects in large-scale biological imaging data.
Introducing hyperparameter α allows control over invariance strength.
Abstract
Real-world datasets often combine data collected under different experimental conditions. This yields larger datasets, but also introduces spurious correlations that make it difficult to model the phenomena of interest. We address this by learning two embeddings to independently represent the phenomena of interest and the spurious correlations. The embedding representing the phenomena of interest is correlated with the target variable , and is invariant to the environment variable . In contrast, the embedding representing the spurious correlations is correlated with . The invariance to is difficult to achieve on real-world datasets. Our primary contribution is an algorithm called Supervised Contrastive Block Disentanglement (SCBD) that effectively enforces this invariance. It is based purely on Supervised Contrastive Learning, and applies to real-world data better than…
Peer Reviews
Decision·Submitted to ICLR 2025
The research problem of domain generalization is well-motivated. The introduction stating the issues with current methods for domain generalization/adaptation is clear, such as the batch effect issues in experimental biology. The writing of how the method works is straightforward to understand. In terms of novelty, the proposed method demonstrate that their method shows a monotonic trade-off between validation and test accuracy. Their experiments also demonstrate their method can achieve the de
The weaknesses are the following: 1. The novelty of the work seems limited. There already exist works that model signals from environment and target variables with two latent factors [1]. The paper also proposed a modification to iVAE, but as the authors mentioned, it was challenging to learn and the experiments do not yield significant improvements from other baselines. 2. While the experiment settings are well-design, each with a clear point that it is trying to demonstrate, having only on
* I thought it was a very clearly written paper - the various terms in the loss function are well motivated from a probabilistic perspective, and clearly explained. * I liked that it was a pragmatic take on an area that has a lot of nice theory but relatively little practical success, suggesting a focus on algorithms is important. * The empirical results clearly demonstrate the role that the invariance loss plays.
* Given that this is primarily a methods paper that is supported by empirical evidence, it would have been nice to see the empirical results replicated across all of WILDS. Aside from the compute requirements, I don't see what's stopping that? * It seems likely that the paper could have been supported with theory that shows that the optimizer of the loss separates the representations (analogous to [Von Kügelgen et al., 2021]). It not essential, but it would have strengthened the paper. * While I
- Novel application of supervised contrastive learning for disentanglement - Clean formulation with interpretable hyperparameter - Thorough experimental evaluation of the proposed method including relevant competing methods - Convincing empirical results on biological batch correction applications
- Limited theoretical analysis - No formal guarantees for disentanglement - Lacks justification for why contrastive learning should work better than alternatives - Practical limitations (as also acknowledged by the authors) - Method requires known environment labels e, limiting broader applicability. - Poor reconstruction quality due to separate decoder training. - Worse CORUM results compared to iVAE with conditioning.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Advanced Numerical Analysis Techniques · Advanced Steganography and Watermarking Techniques
MethodsContrastive Learning
