What Went Wrong? Explaining Overall Dialogue Quality through Utterance-Level Impacts
James D. Finch, Sarah E. Finch, Jinho D. Choi

TL;DR
This paper introduces a novel weakly-supervised method to analyze conversation logs, identifying utterance impacts on dialogue quality without needing detailed annotations, thus reducing effort and aligning well with expert judgments.
Contribution
The work presents a new approach for learning utterance impacts on dialogue quality using only overall user ratings, eliminating the need for utterance-level labels.
Findings
Model's impact assessments align with expert judgments
Automated analysis correlates strongly with overall dialogue quality
Method reduces annotation effort and cost
Abstract
Improving user experience of a dialogue system often requires intensive developer effort to read conversation logs, run statistical analyses, and intuit the relative importance of system shortcomings. This paper presents a novel approach to automated analysis of conversation logs that learns the relationship between user-system interactions and overall dialogue quality. Unlike prior work on utterance-level quality prediction, our approach learns the impact of each interaction from the overall user rating without utterance-level annotation, allowing resultant model conclusions to be derived on the basis of empirical evidence and at low cost. Our model identifies interactions that have a strong correlation with the overall dialogue quality in a chatbot setting. Experiments show that the automated analysis from our model agrees with expert judgments, making this work the first to show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · AI in Service Interactions
