Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs

Gyutaek Oh; Seoyeon Kim; Sangjoon Park; and Byung-Hoon Kim

arXiv:2506.13102·cs.CL·June 17, 2025

Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs

Gyutaek Oh, Seoyeon Kim, Sangjoon Park, and Byung-Hoon Kim

PDF

Open Access

TL;DR

This paper investigates test-time scaling strategies for large language and vision-language models in medical AI, analyzing their effectiveness, robustness, and providing practical guidelines for improving model reliability and interpretability.

Contribution

It offers a comprehensive evaluation of test-time scaling in medical AI, including model-specific strategies and robustness analysis under user-driven factors.

Findings

01

Test-time scaling improves reasoning in medical models.

02

Effectiveness varies with model type and task complexity.

03

Strategies can be refined for better robustness and interpretability.

Abstract

Test-time scaling has recently emerged as a promising approach for enhancing the reasoning capabilities of large language models or vision-language models during inference. Although a variety of test-time scaling strategies have been proposed, and interest in their application to the medical domain is growing, many critical aspects remain underexplored, including their effectiveness for vision-language models and the identification of optimal strategies for different settings. In this paper, we conduct a comprehensive investigation of test-time scaling in the medical domain. We evaluate its impact on both large language models and vision-language models, considering factors such as model size, inherent model characteristics, and task complexity. Finally, we assess the robustness of these strategies under user-driven factors, such as misleading information embedded in prompts. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare