Not to Overfit or Underfit the Source Domains? An Empirical Study of   Domain Generalization in Question Answering

Md Arafat Sultan; Avirup Sil; Radu Florian

arXiv:2205.07257·cs.CL·October 26, 2022

Not to Overfit or Underfit the Source Domains? An Empirical Study of Domain Generalization in Question Answering

Md Arafat Sultan, Avirup Sil, Radu Florian

PDF

Open Access

TL;DR

This paper empirically investigates domain generalization in question answering, revealing that better learning of source domains, rather than overfitting, enhances zero-shot out-of-domain performance, challenging common assumptions.

Contribution

It demonstrates that improving source domain learning via knowledge distillation enhances out-of-domain generalization in question answering, shifting focus from overfitting to underfitting mitigation.

Findings

01

Knowledge distillation improves zero-shot out-of-domain performance.

02

Better source domain learning correlates with improved generalization.

03

Existing DG methods focusing on limiting overfitting are less effective.

Abstract

Machine learning models are prone to overfitting their training (source) domains, which is commonly believed to be the reason why they falter in novel target domains. Here we examine the contrasting view that multi-source domain generalization (DG) is first and foremost a problem of mitigating source domain underfitting: models not adequately learning the signal already present in their multi-domain training data. Experiments on a reading comprehension DG benchmark show that as a model learns its source domains better -- using familiar methods such as knowledge distillation (KD) from a bigger model -- its zero-shot out-of-domain utility improves at an even faster pace. Improved source domain learning also demonstrates superior out-of-domain generalization over three popular existing DG approaches that aim to limit overfitting. Our implementation of KD-based domain generalization is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications

MethodsKnowledge Distillation