LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering

Runze Ma; Shunbo Jia; Haonan Lyu; Guo Liu; Caizhi Liao

arXiv:2605.09384·cs.CV·May 12, 2026

LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering

Runze Ma, Shunbo Jia, Haonan Lyu, Guo Liu, Caizhi Liao

PDF

1 Repo

TL;DR

LiteMedCoT-VL is a parameter-efficient model that transfers multi-step reasoning capabilities from a large teacher to a compact student for medical visual question answering, achieving high accuracy without image captions.

Contribution

It introduces a LoRA-based fine-tuning pipeline that distills reasoning chains from a large model to a smaller one for medical VQA tasks.

Findings

01

Achieves 64.9% accuracy on PMC-VQA benchmark, surpassing larger models.

02

Outperforms all published baselines in medical VQA.

03

Relies on image content rather than textual priors during inference.

Abstract

The reasoning gap between large and compact vision-language models (VLMs) limits the deployment of medical AI on portable clinical devices. Compact VLMs of 2--4B parameters can run on resource-constrained hardware but lack the multi-step reasoning capacity needed for interpretable clinical decision support. Existing knowledge distillation methods transfer answers without the reasoning process behind them. Medical visual question answering (VQA) serves as a testbed for this problem, as it requires models to integrate visual evidence with clinical knowledge through structured reasoning chains. We introduce LiteMedCoT-VL, a pipeline that transfers chain-of-thought reasoning from a 235B teacher model to 2B student models through LoRA-based fine-tuning on explanation-enriched training data. All inference is conducted without image captions by default, simulating the clinical scenario in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/LiteMedCoT-VL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.