Measuring the `I don't know' Problem through the Lens of Gricean   Quantity

Huda Khayrallah; Jo\~ao Sedoc

arXiv:2010.12786·cs.CL·April 23, 2021

Measuring the `I don't know' Problem through the Lens of Gricean Quantity

Huda Khayrallah, Jo\~ao Sedoc

PDF

TL;DR

This paper introduces a linguistically motivated diagnostic, RUQ, to evaluate and analyze the 'I don't know' problem in neural dialog models by comparing generic responses to reference responses based on Gricean Quantity.

Contribution

The paper proposes the RUQ diagnostic tool, grounded in Grice's Maxims, to measure and analyze the prevalence of generic responses in dialog systems, providing a new analytical approach.

Findings

01

Baseline models often prefer 'I don't know' responses over references.

02

Hyperparameter tuning can reduce 'I don't know' responses to below 5%.

03

RUQ enables direct analysis of the 'I don't know' problem.

Abstract

We consider the intrinsic evaluation of neural generative dialog models through the lens of Grice's Maxims of Conversation (1975). Based on the maxim of Quantity (be informative), we propose Relative Utterance Quantity (RUQ) to diagnose the `I don't know' problem, in which a dialog system produces generic responses. The linguistically motivated RUQ diagnostic compares the model score of a generic response to that of the reference response. We find that for reasonable baseline models, `I don't know' is preferred over the reference the majority of the time, but this can be reduced to less than 5% with hyperparameter tuning. RUQ allows for the direct analysis of the `I don't know' problem, which has been addressed but not analyzed by prior work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.