TL;DR
Spot The Bot is a cost-effective evaluation framework that uses bot-to-bot conversations and human annotations to assess and rank chatbots based on their ability to mimic human-like dialogue, enabling frequent and reliable assessments.
Contribution
The paper introduces a novel, efficient evaluation framework for chatbots that replaces human-bot interactions with bot-bot conversations and uses survival analysis for ranking.
Findings
Validated on three domains with state-of-the-art chatbots
Correlates chatbot performance with characteristics like fluency and sensibleness
Allows frequent, low-cost evaluations during development cycles
Abstract
The lack of time-efficient and reliable evaluation methods hamper the development of conversational dialogue systems (chatbots). Evaluations requiring humans to converse with chatbots are time and cost-intensive, put high cognitive demands on the human judges, and yield low-quality results. In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots. Human judges then only annotate for each entity in a conversation whether they think it is human or not (assuming there are humans participants in these conversations). These annotations then allow us to rank chatbots regarding their ability to mimic the conversational behavior of humans. Since we expect that all bots are eventually recognized as such, we incorporate a metric that measures which chatbot can uphold human-like behavior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance
