About Evaluation of F1 Score for RECENT Relation Extraction System
Micha{\l} Olek

TL;DR
This paper discusses the evaluation of the F1 score in relation extraction, highlighting the importance of correct metrics and reevaluation of a system called RECENT that initially claimed state-of-the-art results.
Contribution
It analyzes the F1 score evaluation process and corrects previous results for the RECENT system on the TACRED dataset.
Findings
Initial claimed F1 score of 75.2 on TACRED
Corrected F1 score of 65.16 after reevaluation
Emphasizes importance of proper evaluation metrics
Abstract
This document contains a discussion of the F1 score evaluation used in the article 'Relation Classification with Entity Type Restriction' by Shengfei Lyu, Huanhuan Chen published on Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. The authors created a system named RECENT and claim it achieves (then) a new state-of-the-art result 75.2 (previous 74.8) on the TACRED dataset, while after correcting errors and reevaluation the final result is 65.16
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
