Designing UNICORN: a Unified Benchmark for Imaging in Computational Pathology, Radiology, and Natural Language
Michelle Stegeman, Lena Philipp, Fennie van der Graaf, Marina D'Amato, Cl\'ement Grisi, Luc Builtjes, Joeran S. Bosma, Judith Lefkes, Rianne A. Weber, James A. Meakin, Thomas Koopman, Anne Mickan, Mathias Prokop, Ewoud J. Smit, Geert Litjens, Jeroen van der Laak

TL;DR
UNICORN is a comprehensive, standardized benchmark for evaluating medical foundation models across multiple modalities, tasks, and domains, enabling reproducible and comparable assessments of their generalization capabilities.
Contribution
It introduces a unified evaluation framework, a novel scoring metric, and a large, diverse dataset for benchmarking medical foundation models across various medical imaging and language tasks.
Findings
Benchmark includes data from over 2,400 patients across 8 countries.
Performance is summarized with a new UNICORN Score for cross-domain comparison.
Provides publicly available data, methods, and evaluation tools for reproducible research.
Abstract
Medical foundation models show promise to learn broadly generalizable features from large, diverse datasets. This could be the base for reliable cross-modality generalization and rapid adaptation to new, task-specific goals, with only a few task-specific examples. Yet, evidence for this is limited by the lack of public, standardized, and reproducible evaluation frameworks, as existing public benchmarks are often fragmented across task-, organ-, or modality-specific settings, limiting assessment of cross-task generalization. We introduce UNICORN, a public benchmark designed to systematically evaluate medical foundation models under a unified protocol. To isolate representation quality, we built the benchmark on a novel two-step framework that decouples model inference from task-specific evaluation based on standardized few-shot adaptation. As a central design choice, we constructed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Artificial Intelligence in Healthcare and Education · Domain Adaptation and Few-Shot Learning
