How "open" are the conversations with open-domain chatbots? A proposal   for Speech Event based evaluation

A. Seza Do\u{g}ru\"oz; Gabriel Skantze

arXiv:2211.13560·cs.CL·November 28, 2022

How "open" are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation

A. Seza Do\u{g}ru\"oz, Gabriel Skantze

PDF

Open Access

TL;DR

This paper investigates the scope of open-domain chatbots by classifying speech events in conversations, revealing their limitations mainly to small talk, and proposes revised evaluation methods to better assess their conversational abilities.

Contribution

It introduces a speech event classification framework for chatbot evaluation and suggests shifting from 'open-domain' to 'small talk' terminology, with improved assessment strategies.

Findings

01

Chatbot conversations mainly cover small talk, excluding other speech event categories.

02

Human-human conversations are more coherent across speech events than human-chatbot interactions.

03

Current chatbots lack diversity and coherence in various speech event categories.

Abstract

Open-domain chatbots are supposed to converse freely with humans without being restricted to a topic, task or domain. However, the boundaries and/or contents of open-domain conversations are not clear. To clarify the boundaries of "openness", we conduct two studies: First, we classify the types of "speech events" encountered in a chatbot evaluation data set (i.e., Meena by Google) and find that these conversations mainly cover the "small talk" category and exclude the other speech event categories encountered in real life human-human communication. Second, we conduct a small-scale pilot study to generate online conversations covering a wider range of speech event categories between two humans vs. a human and a state-of-the-art chatbot (i.e., Blender by Facebook). A human evaluation of these generated conversations indicates a preference for human-human conversations, since the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Topic Modeling · Misinformation and Its Impacts

MethodsTest · RoIAlign · Softmax · RoIPool · Meena