Enhancing Automatic PT Tagging for MEDLINE Citations Using Transformer-Based Models

Victor H. Cid; James Mork

arXiv:2506.03321·cs.DL·June 5, 2025

Enhancing Automatic PT Tagging for MEDLINE Citations Using Transformer-Based Models

Victor H. Cid, James Mork

PDF

TL;DR

This paper explores the use of Transformer-based models like BERT to improve the automatic tagging of MEDLINE citations with Medical Subject Headings Publication Types, aiming to enhance biomedical literature indexing.

Contribution

It introduces the application of pre-trained Transformer models for PT prediction, demonstrating improved accuracy over traditional NLP methods.

Findings

01

Transformer models significantly improve PT tagging accuracy

02

Binary classifier ensembles enhance retrieval performance

03

Scalable and efficient biomedical indexing is achievable

Abstract

We investigated the feasibility of predicting Medical Subject Headings (MeSH) Publication Types (PTs) from MEDLINE citation metadata using pre-trained Transformer-based models BERT and DistilBERT. This study addresses limitations in the current automated indexing process, which relies on legacy NLP algorithms. We evaluated monolithic multi-label classifiers and binary classifier ensembles to enhance the retrieval of biomedical literature. Results demonstrate the potential of Transformer models to significantly improve PT tagging accuracy, paving the way for scalable, efficient biomedical indexing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · WordPiece · Weight Decay · Linear Layer · Linear Warmup With Linear Decay · Adam · Dense Connections · BERT · Softmax