Better Smatch = Better Parser? AMR evaluation is not so simple anymore

Juri Opitz; Anette Frank

arXiv:2210.06461·cs.CL·October 13, 2022

Better Smatch = Better Parser? AMR evaluation is not so simple anymore

Juri Opitz, Anette Frank

PDF

Open Access 1 Repo

TL;DR

Despite high Smatch scores suggesting near-human performance, AMR parsing still contains significant semantic errors, and better Smatch does not always mean better quality, highlighting the need for more comprehensive evaluation methods.

Contribution

The paper critically analyzes the limitations of Smatch as an evaluation metric for AMR parsing and advocates for more nuanced assessment approaches.

Findings

01

High Smatch scores often mask semantic errors.

02

AMR parsing quality is not fully captured by Smatch alone.

03

Enhanced evaluation methods are necessary for accurate quality assessment.

Abstract

Recently, astonishing advances have been observed in AMR parsing, as measured by the structural Smatch metric. In fact, today's systems achieve performance levels that seem to surpass estimates of human inter annotator agreement (IAA). Therefore, it is unclear how well Smatch (still) relates to human estimates of parse quality, as in this situation potentially fine-grained errors of similar weight may impact the AMR's meaning to different degrees. We conduct an analysis of two popular and strong AMR parsers that -- according to Smatch -- reach quality levels on par with human IAA, and assess how human quality ratings relate to Smatch and other AMR metrics. Our main findings are: i) While high Smatch scores indicate otherwise, we find that AMR parsing is far from being solved: we frequently find structurally small, but semantically unacceptable errors that substantially distort…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

heidelberg-nlp/amrparseeval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems