Measuring the `I don't know' Problem through the Lens of Gricean Quantity
Huda Khayrallah, Jo\~ao Sedoc

TL;DR
This paper introduces a linguistically motivated diagnostic, RUQ, to evaluate and analyze the 'I don't know' problem in neural dialog models by comparing generic responses to reference responses based on Gricean Quantity.
Contribution
The paper proposes the RUQ diagnostic tool, grounded in Grice's Maxims, to measure and analyze the prevalence of generic responses in dialog systems, providing a new analytical approach.
Findings
Baseline models often prefer 'I don't know' responses over references.
Hyperparameter tuning can reduce 'I don't know' responses to below 5%.
RUQ enables direct analysis of the 'I don't know' problem.
Abstract
We consider the intrinsic evaluation of neural generative dialog models through the lens of Grice's Maxims of Conversation (1975). Based on the maxim of Quantity (be informative), we propose Relative Utterance Quantity (RUQ) to diagnose the `I don't know' problem, in which a dialog system produces generic responses. The linguistically motivated RUQ diagnostic compares the model score of a generic response to that of the reference response. We find that for reasonable baseline models, `I don't know' is preferred over the reference the majority of the time, but this can be reduced to less than 5% with hyperparameter tuning. RUQ allows for the direct analysis of the `I don't know' problem, which has been addressed but not analyzed by prior work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
