FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning

Yue Jiang; Dingkang Yang; Minghao Han; Jinghang Han; Zizhi Chen; Yizhou Liu; Mingcheng Li; Peng Zhai; Lihua Zhang

arXiv:2512.12756·cs.CV·December 16, 2025

FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning

Yue Jiang, Dingkang Yang, Minghao Han, Jinghang Han, Zizhi Chen, Yizhou Liu, Mingcheng Li, Peng Zhai, Lihua Zhang

PDF

Open Access 1 Datasets

TL;DR

FysicsWorld is a comprehensive benchmark supporting bidirectional full-modality tasks across image, video, audio, and text, enabling evaluation of understanding, generation, and reasoning in multimodal models.

Contribution

It introduces the first unified full-modality benchmark with diverse tasks, a novel data construction framework, and extensive evaluation of state-of-the-art models.

Findings

01

Current models show significant performance gaps across modalities.

02

The benchmark reveals limitations in understanding, generation, and reasoning capabilities.

03

FysicsWorld provides a foundation for developing more integrated multimodal architectures.

Abstract

Despite rapid progress in multimodal large language models (MLLMs) and emerging omni-modal architectures, current benchmarks remain limited in scope and integration, suffering from incomplete modality coverage, restricted interaction to text-centric outputs, and weak interdependence and complementarity among modalities. To bridge these gaps, we introduce FysicsWorld, the first unified full-modality benchmark that supports bidirectional input-output across image, video, audio, and text, enabling comprehensive any-to-any evaluation across understanding, generation, and reasoning. FysicsWorld encompasses 16 primary tasks and 3,268 curated samples, aggregated from over 40 high-quality sources and covering a rich set of open-domain categories with diverse question types. We also propose the Cross-Modal Complementarity Screening (CMCS) strategy integrated in a systematic data construction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Fysics-AI/FysicsWorld
dataset· 321 dl
321 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Topic Modeling