Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances
Suvodip Dey, Ramamohan Kummara, Maunendra Sankar Desarkar

TL;DR
This paper introduces Flexible Goal Accuracy (FGA), a new evaluation metric for Dialogue State Tracking that better accounts for turn-level and cumulative prediction performance, addressing limitations of the traditional Joint Goal Accuracy (JGA).
Contribution
The paper proposes FGA, a generalized DST evaluation metric that penalizes locally correct mispredictions, providing a more nuanced assessment of model performance.
Findings
FGA better discriminates DST model performance.
FGA accounts for local correctness in turn predictions.
FGA offers a more flexible evaluation than JGA.
Abstract
Dialogue State Tracking (DST) is primarily evaluated using Joint Goal Accuracy (JGA) defined as the fraction of turns where the ground-truth dialogue state exactly matches the prediction. Generally in DST, the dialogue state or belief state for a given turn contains all the intents shown by the user till that turn. Due to this cumulative nature of the belief state, it is difficult to get a correct prediction once a misprediction has occurred. Thus, although being a useful metric, it can be harsh at times and underestimate the true potential of a DST model. Moreover, an improvement in JGA can sometimes decrease the performance of turn-level or non-cumulative belief state prediction due to inconsistency in annotations. So, using JGA as the only metric for model selection may not be ideal for all scenarios. In this work, we discuss various evaluation metrics used for DST along with their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Service-Oriented Architecture and Web Services
MethodsDynamic Sparse Training · Factor Graph Attention
