A Two-dimensional Zero-shot Dialogue State Tracking Evaluation Method   using GPT-4

Ming Gu; Yan Yang

arXiv:2406.11651·cs.CL·June 18, 2024

A Two-dimensional Zero-shot Dialogue State Tracking Evaluation Method using GPT-4

Ming Gu, Yan Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel two-dimensional zero-shot evaluation method for dialogue state tracking using GPT-4, focusing on accuracy and completeness, and improves evaluation reliability over traditional methods.

Contribution

It proposes a new GPT-4 based zero-shot evaluation framework with manual reasoning prompts, enhancing DST assessment beyond exact matching.

Findings

01

Outperforms baseline evaluation methods

02

Achieves better consistency with traditional evaluation

03

Demonstrates effectiveness of manual reasoning prompts

Abstract

Dialogue state tracking (DST) is evaluated by exact matching methods, which rely on large amounts of labeled data and ignore semantic consistency, leading to over-evaluation. Currently, leveraging large language models (LLM) in evaluating natural language processing tasks has achieved promising results. However, using LLM for DST evaluation is still under explored. In this paper, we propose a two-dimensional zero-shot evaluation method for DST using GPT-4, which divides the evaluation into two dimensions: accuracy and completeness. Furthermore, we also design two manual reasoning paths in prompting to further improve the accuracy of evaluation. Experimental results show that our method achieves better performance compared to the baselines, and is consistent with traditional exact matching based methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SLEEPWALKERG/LLM-DST-EVAL
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Context-Aware Activity Recognition Systems · IoT-based Smart Home Systems

MethodsDynamic Sparse Training · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention