CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems
Abbas Ghaddar, David Alfonso-Hermelo, Philippe Langlais, Mehdi, Rezagholizadeh, Boxing Chen, Prasanna Parthasarathi

TL;DR
This paper introduces CHARP, a diagnostic test set for better evaluating hallucinations and history-awareness in knowledge-grounded dialogue models, revealing current models' shortcomings in attending to conversation history.
Contribution
The paper presents CHARP, a novel evaluation dataset that improves assessment of hallucination and history-awareness in dialogue systems, addressing biases in existing benchmarks.
Findings
Models perform poorly on CHARP due to ineffective history reasoning.
FaithDial evaluation methods overlook conversation history issues.
CHARP can monitor progress in hallucination detection and history understanding.
Abstract
In this work, we dive deep into one of the popular knowledge-grounded dialogue benchmarks that focus on faithfulness, FaithDial. We show that a significant portion of the FaithDial data contains annotation artifacts, which may bias models towards completely ignoring the conversation history. We therefore introduce CHARP, a diagnostic test set, designed for an improved evaluation of hallucinations in conversational model. CHARP not only measures hallucination but also the compliance of the models to the conversation task. Our extensive analysis reveals that models primarily exhibit poor performance on CHARP due to their inability to effectively attend to and reason over the conversation history. Furthermore, the evaluation methods of FaithDial fail to capture these shortcomings, neglecting the conversational history. Our findings indicate that there is substantial room for contribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Multi-Agent Systems and Negotiation
MethodsFocus
