Enhancing Clinical Text Classification via Fine-Tuned DRAGON Longformer Models

Mingchuan Yang; Ziyuan Huang

arXiv:2507.09470·cs.CL·July 15, 2025

Enhancing Clinical Text Classification via Fine-Tuned DRAGON Longformer Models

Mingchuan Yang, Ziyuan Huang

PDF

Open Access

TL;DR

This paper improves clinical text classification by fine-tuning the DRAGON Longformer model with domain-specific adjustments, resulting in significantly better accuracy, precision, recall, and F1-score on medical case data.

Contribution

The study introduces specific enhancements to the DRAGON Longformer for clinical text, including increased sequence length and medical terminology integration, achieving substantial performance improvements.

Findings

01

Accuracy increased from 72.0% to 85.2%.

02

Precision improved from 68.0% to 84.1%.

03

Recall rose from 75.0% to 86.3%.

Abstract

This study explores the optimization of the DRAGON Longformer base model for clinical text classification, specifically targeting the binary classification of medical case descriptions. A dataset of 500 clinical cases containing structured medical observations was used, with 400 cases for training and 100 for validation. Enhancements to the pre-trained joeranbosma/dragon-longformer-base-mixed-domain model included hyperparameter tuning, domain-specific preprocessing, and architectural adjustments. Key modifications involved increasing sequence length from 512 to 1024 tokens, adjusting learning rates from 1e-05 to 5e-06, extending training epochs from 5 to 8, and incorporating specialized medical terminology. The optimized model achieved notable performance gains: accuracy improved from 72.0% to 85.2%, precision from 68.0% to 84.1%, recall from 75.0% to 86.3%, and F1-score from 71.0% to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare · AI in cancer detection