CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers
Shiyang Li, Semih Yavuz, Kazuma Hashimoto, Jia Li, Tong Niu, Nazneen, Rajani, Xifeng Yan, Yingbo Zhou, Caiming Xiong

TL;DR
This paper introduces CoCo, a method for generating controllable counterfactual dialogue scenarios to evaluate and reveal the robustness of dialogue state trackers beyond standard datasets.
Contribution
CoCo leverages turn-level belief states to create novel, realistic conversation scenarios, enabling more comprehensive evaluation of dialogue state tracking models.
Findings
Significant performance drops in DST models on CoCo-generated counterfactuals.
CoCo-generated conversations reflect user goals with over 95% accuracy.
Counterfactual evaluation reveals robustness gaps not seen with paraphrasing techniques.
Abstract
Dialogue state trackers have made significant progress on benchmark datasets, but their generalization capability to novel and realistic scenarios beyond the held-out conversations is less understood. We propose controllable counterfactuals (CoCo) to bridge this gap and evaluate dialogue state tracking (DST) models on novel scenarios, i.e., would the system successfully tackle the request if the user responded differently but still consistently with the dialogue flow? CoCo leverages turn-level belief states as counterfactual conditionals to produce novel conversation scenarios in two steps: (i) counterfactual goal generation at turn-level by dropping and adding slots followed by replacing slot values, (ii) counterfactual conversation generation that is conditioned on (i) and consistent with the dialogue flow. Evaluating state-of-the-art DST models on MultiWOZ dataset with CoCo-generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Machine Learning in Healthcare
MethodsDynamic Sparse Training · Counterfactuals Explanations
