MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation

Rongsheng Wang; Minghao Wu; Hongru Zhou; Zhihan Yu; Zhenyang Cai; Junying Chen; Benyou Wang

arXiv:2603.00585·cs.AI·March 3, 2026

MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation

Rongsheng Wang, Minghao Wu, Hongru Zhou, Zhihan Yu, Zhenyang Cai, Junying Chen, Benyou Wang

PDF

Open Access 3 Reviews

TL;DR

MicroVerse introduces a new approach to microscale simulation by developing a specialized benchmark and dataset, demonstrating the limitations of current models and presenting a tailored video generation model for biological microscale phenomena.

Contribution

This work pioneers the concept of Micro-World Simulation, creating a benchmark and dataset, and developing MicroVerse, a model for accurate microscale biological video generation.

Findings

01

Current SOTA models violate physical laws at microscale

02

MicroVerse accurately reproduces complex microscale mechanisms

03

MicroWorldBench provides systematic evaluation criteria

Abstract

Recent advances in video generation have opened new avenues for macroscopic simulation of complex dynamic systems, but their application to microscopic phenomena remains largely unexplored. Microscale simulation holds great promise for biomedical applications such as drug discovery, organ-on-chip systems, and disease mechanism studies, while also showing potential in education and interactive visualization. In this work, we introduce MicroWorldBench, a multi-level rubric-based benchmark for microscale simulation tasks. MicroWorldBench enables systematic, rubric-based evaluation through 459 unique expert-annotated criteria spanning multiple microscale simulation task (e.g., organ-level processes, cellular dynamics, and subcellular molecular interactions) and evaluation dimensions (e.g., scientific fidelity, visual quality, instruction following). MicroWorldBench reveals that current SOTA…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 0Confidence 5

Strengths

This work creates a curated microscale simulation dataset (MicroSim-10K) with expert verification, addressing lack of domain-specific training data. Presents a fine-tuned model (MicroVerse) that improves fidelity and consistency on the proposed benchmark, showing feasibility of domain adaptation.

Weaknesses

By looking at Figure 1, my primary concern is that the simulations themselves do not appear realistic (as these simulations look they were made for educational purposes). In some cases, the model-generated outputs look more realistic than the ground-truth simulations (e.g., the SORA cell-division example). This raises questions about whether the benchmark’s “expert-verified” reference simulations accurately reflect real **biological phenomena** and whether the evaluation metric truly measures **

Reviewer 02Rating 6Confidence 3

Strengths

- The paper opens an underexplored and impactful direction — using video generation models for microscale (biomedical) simulation, which connects generative AI with scientific and educational applications. - MicroWorldBench and MicroSim-10K are well-constructed and carefully validated, providing a foundation for future research in scientific video generation. - The rubric-based evaluation (Scientific Fidelity, Visual Quality, Instruction Following) combined with expert refinement offers a tra

Weaknesses

- MicroVerse is essentially a fine-tuned version of Wan2.1 without introducing new architectural or physical modeling components. The improvement in scientific fidelity (+0.8 overall) is marginal. - While the benchmark and dataset are significant contributions, the proposed model does not clearly outperform commercial systems or strong open-source baselines, suggesting that the key contribution lies in data curation rather than modeling innovation. - The scoring process relies heavily on GPT-5

Reviewer 03Rating 6Confidence 4

Strengths

- Novel application domain. Applying video generation to microscale simulation is unexplored and potentially impactful for education and biomedical research. - Comprehensive benchmark design. The rubric-based evaluation with expert-curated criteria across three hierarchical levels (organ/cellular/subcellular) is well-motivated and systematic. - Dataset construction effort. Building MicroSim-10K with multiple filtering stages and expert verification demonstrates thoroughness in data curation.

Weaknesses

- Single-domain dataset that may not represent the full distribution and may have data contamination issues. Both the training set (MicroSim-10K) and test set (MicroWorldBench) are built entirely from YouTube videos, which may not reflect the full range of scientific simulation requirements. The authors have not reported how they deduplicate to ensure the test set is fully separate from the training set. Additionally, there is a concern that private models (like Veo3) may have already been train

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Cell Image Analysis Techniques · Human Motion and Animation