Leveraging Large Language Models for Automated Dialogue Analysis

Sarah E. Finch; Ellie S. Paek; Jinho D. Choi

arXiv:2309.06490·cs.CL·September 14, 2023

Leveraging Large Language Models for Automated Dialogue Analysis

Sarah E. Finch, Ellie S. Paek, Jinho D. Choi

PDF

Open Access 1 Repo

TL;DR

This study evaluates ChatGPT-3.5's ability to detect undesirable dialogue behaviors in human-bot interactions, comparing it to specialized models and humans, and discusses its current limitations and future potential.

Contribution

It provides an empirical assessment of ChatGPT's performance in dialogue behavior detection across nine categories in real-world interactions, highlighting its strengths and shortcomings.

Findings

01

ChatGPT often outperforms specialized models.

02

Neither ChatGPT nor specialized models match human performance.

03

Significant room for improvement in LLM-based dialogue behavior detection.

Abstract

Developing high-performing dialogue systems benefits from the automatic identification of undesirable behaviors in system responses. However, detecting such behaviors remains challenging, as it draws on a breadth of general knowledge and understanding of conversational practices. Although recent research has focused on building specialized classifiers for detecting specific dialogue behaviors, the behavior coverage is still incomplete and there is a lack of testing on real-world human-bot interactions. This paper investigates the ability of a state-of-the-art large language model (LLM), ChatGPT-3.5, to perform dialogue behavior detection for nine categories in real human-bot dialogues. We aim to assess whether ChatGPT can match specialized models and approximate human performance, thereby reducing the cost of behavior detection tasks. Our findings reveal that neither specialized models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emorynlp/gpt-abceval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions