Back-Training excels Self-Training at Unsupervised Domain Adaptation of   Question Generation and Passage Retrieval

Devang Kulshreshtha; Robert Belfer; Iulian Vlad Serban; Siva Reddy

arXiv:2104.08801·cs.CL·September 10, 2021

Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

Devang Kulshreshtha, Robert Belfer, Iulian Vlad Serban, Siva Reddy

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper introduces back-training, a novel unsupervised domain adaptation method that outperforms self-training in question generation and passage retrieval tasks by better aligning outputs with noisy inputs.

Contribution

The paper proposes back-training as an alternative to self-training for UDA, demonstrating significant improvements and introducing a new dataset for domain adaptation research.

Findings

01

Back-training outperforms self-training with a 7.8 BLEU-4 point improvement.

02

Back-training achieves 17.6% higher top-20 retrieval accuracy.

03

Proposed consistency filters improve synthetic data quality.

Abstract

In this work, we introduce back-training, an alternative to self-training for unsupervised domain adaptation (UDA) from source to target domain. While self-training generates synthetic training data where natural inputs are aligned with noisy outputs, back-training results in natural outputs aligned with noisy inputs. This significantly reduces the gap between the target domain and synthetic data distribution, and reduces model overfitting to the source domain. We run UDA experiments on question generation and passage retrieval from the \textit{Natural Questions} domain to machine learning and biomedical domains. We find that back-training vastly outperforms self-training by a mean improvement of 7.8 BLEU-4 points on generation, and 17.6\% top-20 retrieval accuracy across both domains. We further propose consistency filters to remove low-quality synthetic data before training. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

McGill-NLP/MLQuestions
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications