Spot The Bot: A Robust and Efficient Framework for the Evaluation of   Conversational Dialogue Systems

Jan Deriu; Don Tuggener; Pius von D\"aniken; Jon Ander Campos; and Alvaro Rodrigo; Thiziri Belkacem; Aitor Soroa; Eneko Agirre and; Mark Cieliebak

arXiv:2010.02140·cs.AI·October 6, 2020

Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

Jan Deriu, Don Tuggener, Pius von D\"aniken, Jon Ander Campos, and Alvaro Rodrigo, Thiziri Belkacem, Aitor Soroa, Eneko Agirre and, Mark Cieliebak

PDF

1 Repo

TL;DR

Spot The Bot is a cost-effective evaluation framework that uses bot-to-bot conversations and human annotations to assess and rank chatbots based on their ability to mimic human-like dialogue, enabling frequent and reliable assessments.

Contribution

The paper introduces a novel, efficient evaluation framework for chatbots that replaces human-bot interactions with bot-bot conversations and uses survival analysis for ranking.

Findings

01

Validated on three domains with state-of-the-art chatbots

02

Correlates chatbot performance with characteristics like fluency and sensibleness

03

Allows frequent, low-cost evaluations during development cycles

Abstract

The lack of time-efficient and reliable evaluation methods hamper the development of conversational dialogue systems (chatbots). Evaluations requiring humans to converse with chatbots are time and cost-intensive, put high cognitive demands on the human judges, and yield low-quality results. In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots. Human judges then only annotate for each entity in a conversation whether they think it is human or not (assuming there are humans participants in these conversations). These annotations then allow us to rank chatbots regarding their ability to mimic the conversational behavior of humans. Since we expect that all bots are eventually recognized as such, we incorporate a metric that measures which chatbot can uphold human-like behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jderiu/spot-the-bot-code
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance