Robust Biomedical Publication Type and Study Design Classification with Knowledge-Guided Perturbations

Shufan Ming; Joe D. Menke; Neil R. Smalheiser; Halil Kilicoglu

arXiv:2605.11502·cs.CL·May 13, 2026

Robust Biomedical Publication Type and Study Design Classification with Knowledge-Guided Perturbations

Shufan Ming, Joe D. Menke, Neil R. Smalheiser, Halil Kilicoglu

PDF

1 Repo

TL;DR

This paper evaluates and improves the robustness of biomedical publication type classifiers by using controlled semantic perturbations and domain-adversarial training to reduce reliance on superficial cues.

Contribution

It introduces an evaluation framework with semantic perturbations and proposes robustness-oriented training strategies that enhance classifier reliability under distributional shifts.

Findings

01

Robustness can be improved without sacrificing in-domain accuracy.

02

Entity masking and domain-adversarial training reduce reliance on spurious features.

03

Refined training objectives enhance the model's focus on salient methodological cues.

Abstract

Accurately and consistently indexing biomedical literature by publication type and study design is essential for supporting evidence synthesis and knowledge discovery. Prior work on automated publication type and study design indexing has primarily focused on expanding label coverage, enriching feature representations, and improving in-domain accuracy, with evaluation typically conducted on data drawn from the same distribution as training. Although pretrained biomedical language models achieve strong performance under these settings, models optimized for in-domain accuracy may rely on superficial lexical or dataset-specific cues, resulting in reduced robustness under distributional shift. In this study, we introduce an evaluation framework based on controlled semantic perturbations to assess the robustness of a publication type classifier and investigate robustness-oriented training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ScienceNLP-Lab/MultiTagger-v2/tree/main/ICHI
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.