TL;DR
This paper proposes a negation-instance based evaluation method for negation resolution, aiming to improve comparability of system performance metrics and providing a standardized evaluation framework with results on multiple datasets.
Contribution
It introduces a new, interpretable evaluation metric for negation resolution that considers individual negation instances, facilitating fairer comparisons across systems.
Findings
Proposed negation-instance based evaluation metrics.
Applied metrics to state-of-the-art systems on three English corpora.
Made evaluation scripts publicly available.
Abstract
In this paper, we revisit the task of negation resolution, which includes the subtasks of cue detection (e.g. "not", "never") and scope resolution. In the context of previous shared tasks, a variety of evaluation metrics have been proposed. Subsequent works usually use different subsets of these, including variations and custom implementations, rendering meaningful comparisons between systems difficult. Examining the problem both from a linguistic perspective and from a downstream viewpoint, we here argue for a negation-instance based approach to evaluating negation resolution. Our proposed metrics correspond to expectations over per-instance scores and hence are intuitively interpretable. To render research comparable and to foster future work, we provide results for a set of current state-of-the-art systems for negation resolution on three English corpora, and make our implementation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
