MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model
Manyu Li, Ruian He, Chenxi Ma, Weimin Tan, Bo Yan

TL;DR
MicroVQA++ is a large, high-quality microscopy VQA dataset created through a multi-stage process involving expert validation, graph-based filtering, and human screening, enabling advanced multimodal reasoning in large language models.
Contribution
The paper introduces MicroVQA++, a novel microscopy VQA dataset with a new graph-based filtering method and a comprehensive data construction process for improved model training.
Findings
The dataset improves microscopy reasoning performance of large models.
Graph-based filtering enhances data quality and consistency.
State-of-the-art results achieved on open-source MLLMs.
Abstract
Multimodal Large Language Models are increasingly applied to biomedical imaging, yet scientific reasoning for microscopy remains limited by the scarcity of large-scale, high-quality training data. We introduce MicroVQA++, a three-stage, large-scale and high-quality microscopy VQA corpus derived from the BIOMEDICA archive. Stage one bootstraps supervision from expert-validated figure-caption pairs sourced from peer-reviewed articles. Stage two applies HiCQA-Graph, a novel heterogeneous graph over images, captions, and QAs that fuses NLI-based textual entailment, CLIP-based vision-language alignment, and agent signals to identify and filter inconsistent samples. Stage three uses a MultiModal Large Language Model (MLLM) agent to generate multiple-choice questions (MCQ) followed by human screening. The resulting release comprises a large training split and a human-checked test split whose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Cell Image Analysis Techniques · Domain Adaptation and Few-Shot Learning
