MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model

Manyu Li; Ruian He; Chenxi Ma; Weimin Tan; Bo Yan

arXiv:2511.11407·cs.CV·November 17, 2025

MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model

Manyu Li, Ruian He, Chenxi Ma, Weimin Tan, Bo Yan

PDF

Open Access 1 Datasets

TL;DR

MicroVQA++ is a large, high-quality microscopy VQA dataset created through a multi-stage process involving expert validation, graph-based filtering, and human screening, enabling advanced multimodal reasoning in large language models.

Contribution

The paper introduces MicroVQA++, a novel microscopy VQA dataset with a new graph-based filtering method and a comprehensive data construction process for improved model training.

Findings

01

The dataset improves microscopy reasoning performance of large models.

02

Graph-based filtering enhances data quality and consistency.

03

State-of-the-art results achieved on open-source MLLMs.

Abstract

Multimodal Large Language Models are increasingly applied to biomedical imaging, yet scientific reasoning for microscopy remains limited by the scarcity of large-scale, high-quality training data. We introduce MicroVQA++, a three-stage, large-scale and high-quality microscopy VQA corpus derived from the BIOMEDICA archive. Stage one bootstraps supervision from expert-validated figure-caption pairs sourced from peer-reviewed articles. Stage two applies HiCQA-Graph, a novel heterogeneous graph over images, captions, and QAs that fuses NLI-based textual entailment, CLIP-based vision-language alignment, and agent signals to identify and filter inconsistent samples. Stage three uses a MultiModal Large Language Model (MLLM) agent to generate multiple-choice questions (MCQ) followed by human screening. The resulting release comprises a large training split and a human-checked test split whose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ieellee/MicroVQA_PlusPlus
dataset· 23 dl
23 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Cell Image Analysis Techniques · Domain Adaptation and Few-Shot Learning