Generative Classifiers Avoid Shortcut Solutions
Alexander C. Li, Ananya Kumar, Deepak Pathak

TL;DR
Generative classifiers, by modeling all features including spurious ones, can better avoid shortcut solutions and improve robustness under distribution shifts, outperforming discriminative models on various benchmarks.
Contribution
This paper demonstrates that generative classifiers can inherently avoid shortcut solutions and achieve state-of-the-art results on distribution shift benchmarks without specialized techniques.
Findings
Generative classifiers outperform discriminative models on five distribution shift benchmarks.
They effectively reduce the impact of spurious correlations in real-world datasets.
Analysis reveals when and why generative classifiers outperform discriminative ones.
Abstract
Discriminative approaches to classification often learn shortcuts that hold in-distribution but fail even under minor distribution shift. This failure mode stems from an overreliance on features that are spuriously correlated with the label. We show that generative classifiers, which use class-conditional generative models, can avoid this issue by modeling all features, both core and spurious, instead of mainly spurious ones. These generative classifiers are simple to train, avoiding the need for specialized augmentations, strong regularization, extra hyperparameters, or knowledge of the specific spurious correlations to avoid. We find that diffusion-based and autoregressive generative classifiers achieve state-of-the-art performance on five standard image and text distribution shift benchmarks and reduce the impact of spurious correlations in realistic applications, such as medical or…
Peer Reviews
Decision·ICLR 2025 Poster
The empirical investigation in the paper corroborates some rather well-accepted intuitions of potential benefits of generative classifiers in distribution-shift scenarios and thus provides useful support for future work in this direction. The systematic toy-problem exploration helps to understand the trade-offs and uses some tools that could be adapted for exploration of more complex scenarios in the future. Some further useful results are presented in the appendix (perhaps worth at least refere
It is unclear to me, how the conclusions documented on the rather simplistic and well controlled toy-problem could be translated to more complex practical problems under realistic scenarios. This is, however, acknowledged by the authors themselves.
- The paper handles the important topic of OOD generalization in machine learning - The method shows good performance on several benchmarks - The paper is well written
- The main issue is limited novelty. Using generative models for classification isn't a new idea, and previous works already explored their robustness to OOD generalization and to adversarial attacks (and also the use of diffusion models as classifiers. For example Zimmermann et al "Score-Based Generative Classifiers", Grathwohl et al "YOUR CLASSIFIER IS SECRETLY AN ENERGY BASED MODEL AND YOU SHOULD TREAT IT LIKE ONE", Chen et al "Robust Classification via a Single Diffusion Model" and Fetaya et
1. The paper is novel and well-written. 2. The paper shows that generative classifiers have advantages compared to discriminative classifiers, both in ID and OOD settings. I think this is a fundamental challenge to existing discriminative learning, which is significant. 3. The experiments are sufficient to validate the proposed hypothesis.
1. I think some recent papers about the superiority of generative classifiers should be mentioned in this paper, including [a,b,c]. [a] Diffusion Models are Certifiably Robust Classifiers, NeurIPS, 2024 [b] Robust Classification via a Single Diffusion Model, ICML, 2024 [c] Revisiting Discriminative vs. Generative Classifiers: Theory and Implications, ICML, 2023
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Machine Learning in Healthcare
