Generalizing over Long Tail Concepts for Medical Term Normalization

Beatrice Portelli; Simone Scaboro; Enrico Santus; Hooman Sedghamiz,; Emmanuele Chersoni; Giuseppe Serra

arXiv:2210.11947·cs.CL·November 4, 2022

Generalizing over Long Tail Concepts for Medical Term Normalization

Beatrice Portelli, Simone Scaboro, Enrico Santus, Hooman Sedghamiz,, Emmanuele Chersoni, Giuseppe Serra

PDF

Open Access 1 Repo

TL;DR

This paper introduces a learning strategy that leverages hierarchical structure in medical ontologies to improve model generalization, especially for unseen concepts, achieving state-of-the-art results and enabling zero-shot transfer.

Contribution

It presents a simple, effective approach that enhances generalization in medical term normalization by utilizing hierarchical information, outperforming existing methods on both seen and unseen concepts.

Findings

01

State-of-the-art performance on seen concepts

02

Consistent improvements on unseen concepts

03

Enables efficient zero-shot transfer across datasets

Abstract

Medical term normalization consists in mapping a piece of text to a large number of output classes. Given the small size of the annotated datasets and the extremely long tail distribution of the concepts, it is of utmost importance to develop models that are capable to generalize to scarce or unseen concepts. An important attribute of most target ontologies is their hierarchical structure. In this paper we introduce a simple and effective learning strategy that leverages such information to enhance the generalizability of both discriminative and generative models. The evaluation shows that the proposed strategy produces state-of-the-art performance on seen concepts and consistent improvements on unseen ones, allowing also for efficient zero-shot knowledge transfer across text typologies and datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ailabudinegit/ontology-pretraining-code
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies