Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task
Ming-Siang Huang, Po-Ting Lai, Richard Tzong-Han Tsai, Wen-Lian Hsu

TL;DR
This paper introduces a revised biomedical NER corpus, the Revised JNLPBA, with improved annotations and quality, enhancing its applicability for relation extraction tasks in biomedical text mining.
Contribution
The study presents a manually curated, revised version of the JNLPBA corpus with improved annotations and addresses previous issues, boosting NER system performance in biomedical relation extraction.
Findings
NER systems perform 10% better on Revised JNLPBA
Revised JNLPBA improves NER accuracy in biomedical relation tasks
The corpus is validated through cross-validation in relation extraction applications
Abstract
The advancement of biomedical named entity recognition (BNER) and biomedical relation extraction (BRE) researches promotes the development of text mining in biological domains. As a cornerstone of BRE, robust BNER system is required to identify the mentioned NEs in plain texts for further relation extraction stage. However, the current BNER corpora, which play important roles in these tasks, paid less attention to achieve the criteria for BRE task. In this study, we present Revised JNLPBA corpus, the revision of JNLPBA corpus, to broaden the applicability of a NER corpus from BNER to BRE task. We preserve the original entity types including protein, DNA, RNA, cell line and cell type while all the abstracts in JNLPBA corpus are manually curated by domain experts again basis on the new annotation guideline focusing on the specific NEs instead of general terms. Simultaneously, several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
