MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning

Min Zeng; Shuang Zhou; Zaifu Zhan; Rui Zhang

arXiv:2603.16738·cs.AI·March 18, 2026

MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning

Min Zeng, Shuang Zhou, Zaifu Zhan, Rui Zhang

PDF

Open Access

TL;DR

MedCL-Bench introduces a comprehensive benchmarking framework for evaluating stability and efficiency trade-offs in biomedical continual learning, highlighting the impact of task order, model strategies, and task types on performance.

Contribution

It provides the first unified, task-diverse benchmark for biomedical NLP continual learning, including standardized protocols and analysis of various strategies across multiple datasets.

Findings

01

Sequential fine-tuning causes catastrophic forgetting.

02

Parameter-isolation offers the best retention per GPU-hour.

03

Forgetting varies by task type, with multi-label classification most vulnerable.

Abstract

Medical language models must be updated as evidence and terminology evolve, yet sequential updating can trigger catastrophic forgetting. Although biomedical NLP has many static benchmarks, no unified, task-diverse benchmark exists for evaluating continual learning under standardized protocols, robustness to task order and compute-aware reporting. We introduce MedCL-Bench, which streams ten biomedical NLP datasets spanning five task families and evaluates eleven continual learning strategies across eight task orders, reporting retention, transfer, and GPU-hour cost. Across backbones and task orders, direct sequential fine-tuning on incoming tasks induces catastrophic forgetting, causing update-induced performance regressions on prior tasks. Continual learning methods occupy distinct retention-compute frontiers: parameter-isolation provides the best retention per GPU-hour, replay offers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education