Data-Efficient Biomedical In-Context Learning: A Diversity-Enhanced Submodular Perspective

Jun Wang; Zaifu Zhan; Qixin Zhang; Mingquan Lin; Meijia Song; Rui Zhang

arXiv:2508.08140·cs.CL·August 12, 2025

Data-Efficient Biomedical In-Context Learning: A Diversity-Enhanced Submodular Perspective

Jun Wang, Zaifu Zhan, Qixin Zhang, Mingquan Lin, Meijia Song, Rui Zhang

PDF

Open Access

TL;DR

This paper introduces Dual-Div, a novel framework for biomedical in-context learning that enhances demonstration selection by balancing diversity and representativeness, leading to improved performance on NLP tasks.

Contribution

The paper proposes a two-stage retrieval and ranking method that emphasizes diversity in demonstration selection for biomedical NLP in-context learning.

Findings

01

Dual-Div outperforms baselines with up to 5% higher macro-F1 scores.

02

Diversity in initial retrieval is more impactful than ranking optimization.

03

Limiting demonstrations to 3-5 examples maximizes efficiency.

Abstract

Recent progress in large language models (LLMs) has leveraged their in-context learning (ICL) abilities to enable quick adaptation to unseen biomedical NLP tasks. By incorporating only a few input-output examples into prompts, LLMs can rapidly perform these new tasks. While the impact of these demonstrations on LLM performance has been extensively studied, most existing approaches prioritize representativeness over diversity when selecting examples from large corpora. To address this gap, we propose Dual-Div, a diversity-enhanced data-efficient framework for demonstration selection in biomedical ICL. Dual-Div employs a two-stage retrieval and ranking process: First, it identifies a limited set of candidate examples from a corpus by optimizing both representativeness and diversity (with optional annotation for unlabeled data). Second, it ranks these candidates against test queries to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare