MicroWorld: Empowering Multimodal Large Language Models to Bridge the Microscopic Domain Gap with Multimodal Attribute Graph

Manyu Li; Ruian He; Chenxi Ma; Weimin Tan; Bo Yan

arXiv:2605.10120·cs.CV·May 12, 2026

MicroWorld: Empowering Multimodal Large Language Models to Bridge the Microscopic Domain Gap with Multimodal Attribute Graph

Manyu Li, Ruian He, Chenxi Ma, Weimin Tan, Bo Yan

PDF

1 Repo

TL;DR

MicroWorld enhances multimodal large language models' reasoning in microscopy by integrating a large, structured knowledge graph at inference time, significantly improving performance without domain-specific fine-tuning.

Contribution

MicroWorld introduces a novel framework that constructs a large-scale biomedical knowledge graph and uses it to augment reasoning in MLLMs without fine-tuning.

Findings

01

37.5% improvement on MicroVQA benchmark

02

6.0% performance gain on MicroBench

03

State-of-the-art results achieved

Abstract

Multimodal large language models (MLLMs) show remarkable potential for scientific reasoning, yet their performance in specialized domains such as microscopy remains limited by the scarcity of domain-specific training data and the difficulty of encoding fine-grained expert knowledge into model parameters. To bridge the gap, we introduce MicroWorld, a framework that constructs a multimodal attributed property graph (MAPG) from large-scale scientific image--caption corpora and leverages it to augment MLLM reasoning at inference time without any domain-specific fine-tuning. MicroWorld extracts biomedical entities and relations via scispaCy or LLM-based triplet mining, aligns images and entities in a shared embedding space using Qwen3-VL-Embedding, and assembles a knowledge graph comprising approximately 111K nodes and 346K typed edges spanning eight relation categories. At inference time, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ieellee/MicroWorld
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.