Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain Chatbot Consistency
Zekang Li, Jinchao Zhang, Zhengcong Fei, Yang Feng, Jie Zhou

TL;DR
This paper introduces AIH, a practical framework for evaluating open-domain chatbots' consistency by simulating inquiries about history, reducing costs and bias compared to human evaluations, and reliably ranking chatbot consistency.
Contribution
The paper presents a novel framework that efficiently assesses chatbot consistency through automated inquiries and contradiction detection, improving evaluation reliability and practicality.
Findings
AIH correlates highly with human judgment in chatbot ranking.
The framework reduces evaluation costs and biases.
Experiments demonstrate effective and reliable consistency assessment.
Abstract
A good open-domain chatbot should avoid presenting contradictory responses about facts or opinions in a conversational session, known as its consistency capacity. However, evaluating the consistency capacity of a chatbot is still challenging. Employing human judges to interact with chatbots on purpose to check their capacities is costly and low-efficient, and difficult to get rid of subjective bias. In this paper, we propose the Addressing Inquiries about History (AIH), an efficient and practical framework for the consistency evaluation. At the conversation stage, AIH attempts to address appropriate inquiries about the dialogue history to induce the chatbot to redeclare the historical facts or opinions. We carry out the conversation between chatbots, which is more efficient than the human-bot interaction and can also alleviate the subjective bias. In this way, we manage to rapidly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Misinformation and Its Impacts
