How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking

Rafid Ahmed; Intesar Tahmid; Mir Sazzat Hossain; Tasnimul Hossain Tomal; Md Fahim; Md Farhad Alam Bhuiyan

arXiv:2605.18111·cs.CL·May 19, 2026

How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking

Rafid Ahmed, Intesar Tahmid, Mir Sazzat Hossain, Tasnimul Hossain Tomal, Md Fahim, Md Farhad Alam Bhuiyan

PDF

1 Video

TL;DR

This paper introduces BanglaMedVQA, a new dataset for Bangla medical visual question answering, and evaluates current models, revealing significant performance gaps in low-resource language medical reasoning.

Contribution

The creation of BanglaMedVQA dataset and comprehensive benchmarking of foundation models on Bangla medical visual questions.

Findings

01

Current models perform poorly on Bangla MedVQA, especially on complex diagnostic questions.

02

Top models like GPT-4.1 mini still fail to accurately answer specialized medical questions.

03

Open-source models like Gemma-3 sometimes outperform top models in general categories.

Abstract

Recent advancements in Large Language Models (LLMs) and Large Vision Language Models (LVLMs) have enabled general-purpose systems to demonstrate promising capabilities in complex reasoning tasks, including those in the medical domain. Medical Visual Question Answering (MedVQA) has particularly benefited from these developments. However, despite Bangla being one of the most widely spoken languages globally, there exists no established MedVQA benchmark for it. To address this gap, we introduce BanglaMedVQA, a dataset comprising clinically validated image-question-answer pairs, along with a comprehensive evaluation of current foundation models on this resource. Consistent with prior findings that report low performance of current models on English MedVQA benchmarks, our analysis reveals that Bangla performance is substantially lower, reflecting the challenges inherent to low-resource…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking· underline