DoCoGen: Domain Counterfactual Generation for Low Resource Domain   Adaptation

Nitay Calderon; Eyal Ben-David; Amir Feder; Roi Reichart

arXiv:2202.12350·cs.CL·March 8, 2022

DoCoGen: Domain Counterfactual Generation for Low Resource Domain Adaptation

Nitay Calderon, Eyal Ben-David, Amir Feder, Roi Reichart

PDF

Open Access 1 Repo

TL;DR

This paper introduces DoCoGen, a controllable generation method that creates domain-counterfactual texts to improve NLP model adaptation to new domains without requiring labeled data or parallel examples.

Contribution

DoCoGen is a novel, unlabeled-data-driven approach for generating domain-specific counterfactual texts to enhance low-resource domain adaptation in NLP.

Findings

01

Outperforms strong baselines in domain adaptation tasks.

02

Improves accuracy of sentiment and intent classifiers in low-resource settings.

03

Generates coherent multi-sentence counterfactual examples.

Abstract

Natural language processing (NLP) algorithms have become very successful, but they still struggle when applied to out-of-distribution examples. In this paper we propose a controllable generation approach in order to deal with this domain adaptation (DA) challenge. Given an input text example, our DoCoGen algorithm generates a domain-counterfactual textual example (D-con) - that is similar to the original in all aspects, including the task label, but its domain is changed to a desired one. Importantly, DoCoGen is trained using only unlabeled examples from multiple domains - no NLP task labels or parallel pairs of textual examples and their domain-counterfactuals are required. We show that DoCoGen can generate coherent counterfactuals consisting of multiple sentences. We use the D-cons generated by DoCoGen to augment a sentiment classifier and a multi-label intent classifier in 20 and 78…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nitaytech/docogen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsCounterfactuals Explanations