Should Bias be Eliminated? A General Framework to Use Bias for OOD Generalization
Yan Li, Yunlong Deng, Zijian Li, Anpeng Wu, Zeyu Tang, Kun Zhang, Guangyi Chen

TL;DR
This paper explores whether bias should be eliminated or leveraged for out-of-distribution generalization, proposing a framework that uses bias constructively through a generative model and environment estimation to improve robustness.
Contribution
It introduces a novel framework that leverages bias using a generative model and environment estimation, challenging the conventional approach of bias elimination for better OOD generalization.
Findings
Outperforms invariance-only baselines on benchmarks
Improves robustness and adaptability in OOD settings
Effectively leverages bias for better generalization
Abstract
Most approaches to out-of-distribution (OOD) generalization learn domain-invariant representations by discarding contextual bias. In this paper, we raise a critical question: Should bias be eliminated? If not, is there a general way to leverage bias for better OOD generalization? To answer these questions, we first provide a theoretical analysis that characterizes the circumstances in which biased features contribute positively. Although theoretical results show that bias may sometimes play a positive role, leveraging it effectively is non-trivial, since its harmful and beneficial components are often entangled. Recent advances have sought to refine the prediction of bias by presuming reliable predictions from invariant features. However, such assumptions may be too strong in the real world, especially when the target also shifts from training to testing domains. Motivated by this…
Peer Reviews
Decision·Submitted to ICLR 2026
- The paper has a fairly decent novelty contribution, challenging entrenched assumptions in DG research about bias should be eliminated or not and reframes bias as a potentially beneficial signal. Furthermore, the formalization of when bias helps prediction (via “unblocked influence”) and the proofs of identifiability and performance are useful for technical depth and novelty. The use of causal graphs and conditional independencies is well-motivated. - BAG unifies multiple strands: causal repres
- Only tests the problem on very small datasets, making it difficult to understand or interpret whether this framework is generalisable. - The theoretical identifiability conditions (A1–A4) require smooth, positive densities and independent latent dimensions given (e, y). These are rarely satisfied in high-dimensional deep representations. A discussion of approximate or empirical identifiability would help. - Although disentanglement is central, the paper lacks visualizations or examples showing
(1) The fundamental idea is reasonable. Covariate bias is indeed influential in classification and OOD-related tasks. I believe the authors’ perspective is justified. However, I believe this can be both helpful and risky. it may improve performance on certain tasks but could also lead to biased errors in other scenarios. (2) The authors conduct experiments on both synthetic and real-world datasets, demonstrating the effectiveness of their proposed method compared to existing approaches.
I have reviewed this paper for NeurIPS 2025, in which 5 reviewers unanimously decided to reject the paper. In my previous review, I outlined several issues, but I found that the authors did not make substantial revisions. Even minor issues, such as typographical errors in symbols and writing, were not corrected. Given these considerations, I have decided to reject the paper. In today’s world, where the volume of peer review work is high, I strongly recommend that the authors take each reviewer
* **Interesting and original perspective** — The paper takes a refreshing stance by arguing that bias or spurious features are not always detrimental. This idea of *leveraging* bias for better OOD generalization is both conceptually interesting and relevant, especially given how dominant the “bias elimination” mindset has been in this field. * **Solid theoretical framing** — The authors provide a clear theoretical justification for when and why bias can be beneficial, using identifiability cond
* **Lack of ablations and component analysis** — The paper does not provide sufficient ablation studies to isolate where the performance gain comes from. For instance, it’s unclear how much each part—the content predictor, bias predictor, environment routing module, or adaptive prior—actually contributes. The absence of results like *C-only vs. B-only*, or *with vs. without routing and prior correction*, makes it hard to judge whether the added modules are necessary or if the model mainly reprod
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTechnology Adoption and User Behaviour
