Evaluating Predictive Uncertainty under Distributional Shift on Dialogue   Dataset

Nyoungwoo Lee; ChaeHun Park; Ho-Jin Choi

arXiv:2109.00186·cs.CL·September 2, 2021

Evaluating Predictive Uncertainty under Distributional Shift on Dialogue Dataset

Nyoungwoo Lee, ChaeHun Park, Ho-Jin Choi

PDF

Open Access

TL;DR

This paper introduces methods to simulate gradual distributional shifts in dialogue datasets and evaluates how existing uncertainty estimation methods perform under these shifts, revealing performance degradation.

Contribution

It proposes Unknown Word and Insufficient Context methods for simulating distributional shifts in dialogue data and assesses their impact on uncertainty estimation methods.

Findings

01

Uncertainty estimation performance degrades with increased distributional shift.

02

Proposed methods effectively simulate distributional shifts for evaluation.

03

Existing methods are less reliable under extensive distributional shifts.

Abstract

In open-domain dialogues, predictive uncertainties are mainly evaluated in a domain shift setting to cope with out-of-distribution inputs. However, in real-world conversations, there could be more extensive distributional shifted inputs than the out-of-distribution. To evaluate this, we first propose two methods, Unknown Word (UW) and Insufficient Context (IC), enabling gradual distributional shifts by corruption on the dialogue dataset. We then investigate the effect of distributional shifts on accuracy and calibration. Our experiments show that the performance of existing uncertainty estimation methods consistently degrades with intensifying the shift. The results suggest that the proposed methods could be useful for evaluating the calibration of dialogue systems under distributional shifts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Context-Aware Activity Recognition Systems