Evaluating Predictive Uncertainty under Distributional Shift on Dialogue Dataset
Nyoungwoo Lee, ChaeHun Park, Ho-Jin Choi

TL;DR
This paper introduces methods to simulate gradual distributional shifts in dialogue datasets and evaluates how existing uncertainty estimation methods perform under these shifts, revealing performance degradation.
Contribution
It proposes Unknown Word and Insufficient Context methods for simulating distributional shifts in dialogue data and assesses their impact on uncertainty estimation methods.
Findings
Uncertainty estimation performance degrades with increased distributional shift.
Proposed methods effectively simulate distributional shifts for evaluation.
Existing methods are less reliable under extensive distributional shifts.
Abstract
In open-domain dialogues, predictive uncertainties are mainly evaluated in a domain shift setting to cope with out-of-distribution inputs. However, in real-world conversations, there could be more extensive distributional shifted inputs than the out-of-distribution. To evaluate this, we first propose two methods, Unknown Word (UW) and Insufficient Context (IC), enabling gradual distributional shifts by corruption on the dialogue dataset. We then investigate the effect of distributional shifts on accuracy and calibration. Our experiments show that the performance of existing uncertainty estimation methods consistently degrades with intensifying the shift. The results suggest that the proposed methods could be useful for evaluating the calibration of dialogue systems under distributional shifts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Context-Aware Activity Recognition Systems
