Extensive Error Analysis and a Learning-Based Evaluation of Medical Entity Recognition Systems to Approximate User Experience
Isar Nejadgholi, Kathleen C. Fraser, Berry De Bruijn

TL;DR
This paper analyzes the impact of span mismatches in medical entity recognition, showing that many errors are acceptable to users, and proposes a learning-based evaluation metric that better reflects user experience.
Contribution
It introduces a lightweight classifier to approximate user acceptance of span mismatches and proposes a new F-score metric aligned with user perception.
Findings
25% of errors have overlapping spans with gold standard entities.
Over 90% of these span mismatches are accepted or partially accepted by users.
The learning-based F-score better approximates user experience than traditional metrics.
Abstract
When comparing entities extracted by a medical entity recognition system with gold standard annotations over a test set, two types of mismatches might occur, label mismatch or span mismatch. Here we focus on span mismatch and show that its severity can vary from a serious error to a fully acceptable entity extraction due to the subjectivity of span annotations. For a domain-specific BERT-based NER system, we showed that 25% of the errors have the same labels and overlapping span with gold standard entities. We collected expert judgement which shows more than 90% of these mismatches are accepted or partially accepted by the user. Using the training set of the NER system, we built a fast and lightweight entity classifier to approximate the user experience of such mismatches through accepting or rejecting them. The decisions made by this classifier are used to calculate a learning-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare · Topic Modeling
