Improving Tagging Consistency and Entity Coverage for Chemical   Identification in Full-text Articles

Hyunjae Kim; Mujeen Sung; Wonjin Yoon; Sungjoon Park; Jaewoo Kang

arXiv:2111.10584·cs.CL·November 23, 2021·5 cites

Improving Tagging Consistency and Entity Coverage for Chemical Identification in Full-text Articles

Hyunjae Kim, Mujeen Sung, Wonjin Yoon, Sungjoon Park, Jaewoo Kang

PDF

Open Access

TL;DR

This paper presents a system that enhances chemical entity recognition in full-text articles by improving tagging consistency and coverage, achieving top performance in the BioCreative VII challenge.

Contribution

It introduces a hybrid approach combining dictionary and neural models, along with majority voting, to improve chemical NER in full-text articles, outperforming existing methods.

Findings

01

Achieved highest NER performance in BioCreative VII challenge

02

Significantly improved recall over baseline models

03

Outperformed over 80 submissions from 16 teams

Abstract

This paper is a technical report on our system submitted to the chemical identification task of the BioCreative VII Track 2 challenge. The main feature of this challenge is that the data consists of full-text articles, while current datasets usually consist of only titles and abstracts. To effectively address the problem, we aim to improve tagging consistency and entity coverage using various methods such as majority voting within the same articles for named entity recognition (NER) and a hybrid approach that combines a dictionary and a neural model for normalization. In the experiments on the NLM-Chem dataset, we show that our methods improve models' performance, particularly in terms of recall. Finally, in the official evaluation of the challenge, our system was ranked 1st in NER by significantly outperforming the baseline model and more than 80 submissions from 16 teams.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques