Towards Fair Evaluation of Dialogue State Tracking by Flexible   Incorporation of Turn-level Performances

Suvodip Dey; Ramamohan Kummara; Maunendra Sankar Desarkar

arXiv:2204.03375·cs.CL·April 8, 2022

Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances

Suvodip Dey, Ramamohan Kummara, Maunendra Sankar Desarkar

PDF

Open Access 1 Repo

TL;DR

This paper introduces Flexible Goal Accuracy (FGA), a new evaluation metric for Dialogue State Tracking that better accounts for turn-level and cumulative prediction performance, addressing limitations of the traditional Joint Goal Accuracy (JGA).

Contribution

The paper proposes FGA, a generalized DST evaluation metric that penalizes locally correct mispredictions, providing a more nuanced assessment of model performance.

Findings

01

FGA better discriminates DST model performance.

02

FGA accounts for local correctness in turn predictions.

03

FGA offers a more flexible evaluation than JGA.

Abstract

Dialogue State Tracking (DST) is primarily evaluated using Joint Goal Accuracy (JGA) defined as the fraction of turns where the ground-truth dialogue state exactly matches the prediction. Generally in DST, the dialogue state or belief state for a given turn contains all the intents shown by the user till that turn. Due to this cumulative nature of the belief state, it is difficult to get a correct prediction once a misprediction has occurred. Thus, although being a useful metric, it can be harsh at times and underestimate the true potential of a DST model. Moreover, an improvement in JGA can sometimes decrease the performance of turn-level or non-cumulative belief state prediction due to inconsistency in annotations. So, using JGA as the only metric for model selection may not be ideal for all scenarios. In this work, we discuss various evaluation metrics used for DST along with their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

suvodipdey/fga
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Service-Oriented Architecture and Web Services

MethodsDynamic Sparse Training · Factor Graph Attention