Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang, Yaoxiao Song, Xuan Ren, Chenyang Lyu, Yidong Wang,, Lingqiao Liu, Jindong Wang, Jennifer Foster, Yue Zhang

TL;DR
This paper provides a comprehensive review of out-of-distribution generalization challenges, methods, and evaluations in NLP text classification, highlighting gaps and future directions to improve model robustness.
Contribution
It offers the first extensive survey on OOD generalization in NLP text classification, summarizing recent progress and identifying key challenges and future research avenues.
Findings
Highlights the importance of robustness to OOD data in NLP
Summarizes recent methods and evaluations for OOD generalization
Identifies gaps and proposes future research directions
Abstract
Machine learning (ML) systems in natural language processing (NLP) face significant challenges in generalizing to out-of-distribution (OOD) data, where the test distribution differs from the training data distribution. This poses important questions about the robustness of NLP models and their high accuracy, which may be artificially inflated due to their underlying sensitivity to systematic biases. Despite these challenges, there is a lack of comprehensive surveys on the generalization challenge from an OOD perspective in text classification. Therefore, this paper aims to fill this gap by presenting the first comprehensive review of recent progress, methods, and evaluations on this topic. We furth discuss the challenges involved and potential future research directions. By providing quick access to existing work, we hope this survey will encourage future research in this area.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Machine Learning and Data Classification
MethodsTest
