Leveraging LLMs for Dialogue Quality Measurement

Jinghan Jia; Abi Komma; Timothy Leffel; Xujun Peng; Ajay Nagesh; Tamer; Soliman; Aram Galstyan; Anoop Kumar

arXiv:2406.17304·cs.CL·June 26, 2024

Leveraging LLMs for Dialogue Quality Measurement

Jinghan Jia, Abi Komma, Timothy Leffel, Xujun Peng, Ajay Nagesh, Tamer, Soliman, Aram Galstyan, Anoop Kumar

PDF

Open Access

TL;DR

This paper investigates how large language models can be effectively used for automated dialogue quality assessment, demonstrating that larger, fine-tuned models with reasoning abilities outperform traditional methods.

Contribution

It introduces a comprehensive analysis of LLM configurations for dialogue evaluation, highlighting the benefits of fine-tuning, model size, and reasoning techniques like CoT.

Findings

01

Larger models produce more accurate dialogue labels.

02

Algorithmic selection of in-context examples improves performance.

03

Chain-of-thought reasoning enhances evaluation accuracy.

Abstract

In task-oriented conversational AI evaluation, unsupervised methods poorly correlate with human judgments, and supervised approaches lack generalization. Recent advances in large language models (LLMs) show robust zeroshot and few-shot capabilities across NLP tasks. This paper explores using LLMs for automated dialogue quality evaluation, experimenting with various configurations on public and proprietary datasets. Manipulating factors such as model size, in-context examples, and selection techniques, we examine "chain-of-thought" (CoT) reasoning and label extraction procedures. Our results show that (1) larger models yield more accurate dialogue labels; (2) algorithmic selection of in-context examples outperforms random selection; (3) CoT reasoning where an LLM is asked to provide justifications before outputting final labels improves performance; and (4) fine-tuned LLMs outperform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling