Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues
Jiao Ou, Jinchao Zhang, Yang Feng, Jie Zhou

TL;DR
This paper introduces a counterfactual data augmentation approach for open-domain dialogues that generates semantically diverse responses to improve dialogue system training, outperforming baselines.
Contribution
It proposes a novel counterfactual inference method to automatically augment dialogue datasets with diverse responses, reducing manual data collection efforts.
Findings
Augmented datasets improve downstream dialogue task performance.
The method generates semantically varied responses effectively.
Outperforms existing baselines in experiments.
Abstract
The construction of open-domain dialogue systems requires high-quality dialogue datasets. The dialogue data admits a wide variety of responses for a given dialogue history, especially responses with different semantics. However, collecting high-quality such a dataset in most scenarios is labor-intensive and time-consuming. In this paper, we propose a data augmentation method to automatically augment high-quality responses with different semantics by counterfactual inference. Specifically, given an observed dialogue, our counterfactual generation model first infers semantically different responses by replacing the observed reply perspective with substituted ones. Furthermore, our data selection method filters out detrimental augmented responses. Experimental results show that our data augmentation method can augment high-quality responses with different semantics for a given dialogue…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
