RVLM: Recursive Vision-Language Models with Adaptive Depth

Nicanor Mayumu; Zeenath Khan; Melodena Stephens; Patrick Mukala; Farhad Oroumchian

arXiv:2603.24224·cs.CV·March 26, 2026

RVLM: Recursive Vision-Language Models with Adaptive Depth

Nicanor Mayumu, Zeenath Khan, Melodena Stephens, Patrick Mukala, Farhad Oroumchian

PDF

Open Access

TL;DR

RVLM introduces an adaptive, iterative vision-language framework for medical AI that generates executable code for transparent reasoning and adjusts its depth based on task complexity, improving interpretability and efficiency.

Contribution

It proposes a unified framework combining iterative reasoning with adaptive depth control, enabling transparent, executable diagnostics in medical imaging without fine-tuning.

Findings

01

High consistency in salient findings detection

02

Effective cross-modal discrepancy identification

03

Structured report generation in chest X-ray analysis

Abstract

Medical AI systems face two fundamental limitations. First, conventional vision-language models (VLMs) perform single-pass inference, yielding black-box predictions that cannot be audited or explained in clinical terms. Second, iterative reasoning systems that expose intermediate steps rely on fixed iteration budgets wasting compute on simple cases while providing insufficient depth for complex ones. We address both limitations with a unified framework. RVLM replaces single-pass inference with an iterative generate-execute loop: at each step, the model writes Python code, invokes vision sub-agents, manipulates images, and accumulates evidence. Every diagnostic claim is grounded in executable code, satisfying auditability requirements of clinical AI governance frameworks. RRouter makes iteration depth adaptive: a lightweight controller predicts the optimal budget from task-complexity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning