A Survey on Out-of-Distribution Evaluation of Neural NLP Models
Xinzhe Li, Ming Liu, Shang Gao, Wray Buntine

TL;DR
This survey reviews out-of-distribution evaluation in neural NLP models, comparing adversarial robustness, domain generalization, and dataset biases, highlighting their differences, evaluation methods, challenges, and future opportunities.
Contribution
It provides a unified comparison and summary of three key research lines in OOD evaluation for neural NLP models, which was lacking in existing literature.
Findings
Unified framework for OOD evaluation in NLP
Comparison of data-generating processes and protocols
Identification of challenges and future directions
Abstract
Adversarial robustness, domain generalization and dataset biases are three active lines of research contributing to out-of-distribution (OOD) evaluation on neural NLP models. However, a comprehensive, integrated discussion of the three research lines is still lacking in the literature. In this survey, we 1) compare the three lines of research under a unifying definition; 2) summarize the data-generating processes and evaluation protocols for each line of research; and 3) emphasize the challenges and opportunities for future work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
