CoCo: Controllable Counterfactuals for Evaluating Dialogue State   Trackers

Shiyang Li; Semih Yavuz; Kazuma Hashimoto; Jia Li; Tong Niu; Nazneen; Rajani; Xifeng Yan; Yingbo Zhou; Caiming Xiong

arXiv:2010.12850·cs.CL·March 29, 2021·41 cites

CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers

Shiyang Li, Semih Yavuz, Kazuma Hashimoto, Jia Li, Tong Niu, Nazneen, Rajani, Xifeng Yan, Yingbo Zhou, Caiming Xiong

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces CoCo, a method for generating controllable counterfactual dialogue scenarios to evaluate and reveal the robustness of dialogue state trackers beyond standard datasets.

Contribution

CoCo leverages turn-level belief states to create novel, realistic conversation scenarios, enabling more comprehensive evaluation of dialogue state tracking models.

Findings

01

Significant performance drops in DST models on CoCo-generated counterfactuals.

02

CoCo-generated conversations reflect user goals with over 95% accuracy.

03

Counterfactual evaluation reveals robustness gaps not seen with paraphrasing techniques.

Abstract

Dialogue state trackers have made significant progress on benchmark datasets, but their generalization capability to novel and realistic scenarios beyond the held-out conversations is less understood. We propose controllable counterfactuals (CoCo) to bridge this gap and evaluate dialogue state tracking (DST) models on novel scenarios, i.e., would the system successfully tackle the request if the user responded differently but still consistently with the dialogue flow? CoCo leverages turn-level belief states as counterfactual conditionals to produce novel conversation scenarios in two steps: (i) counterfactual goal generation at turn-level by dropping and adding slots followed by replacing slot values, (ii) counterfactual conversation generation that is conditioned on (i) and consistent with the dialogue flow. Evaluating state-of-the-art DST models on MultiWOZ dataset with CoCo-generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers· slideslive

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Machine Learning in Healthcare

MethodsDynamic Sparse Training · Counterfactuals Explanations