OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

Kaihang Pan; Qi Tian; Jianwei Zhang; Weijie Kong; Jiangfeng Xiong; Yanxin Long; Shixue Zhang; Haiyi Qiu; Tan Wang; Zheqi Lv; Yue Wu; Liefeng Bo; Siliang Tang; Zhao Zhong

arXiv:2603.24458·cs.CV·April 3, 2026

OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

Kaihang Pan, Qi Tian, Jianwei Zhang, Weijie Kong, Jiangfeng Xiong, Yanxin Long, Shixue Zhang, Haiyi Qiu, Tan Wang, Zheqi Lv, Yue Wu, Liefeng Bo, Siliang Tang, Zhao Zhong

PDF

2 Repos 3 Models

TL;DR

OmniWeaving is a unified video generation model that leverages large-scale multimodal pretraining to enable complex, reasoning-informed video creation from diverse inputs, outperforming existing open-source models.

Contribution

The paper introduces OmniWeaving, a novel unified video generation framework with multimodal reasoning, and presents IntelligentVBench, a new benchmark for evaluating such models.

Findings

01

OmniWeaving achieves state-of-the-art performance among open-source unified models.

02

The model effectively binds text, images, and videos for complex video synthesis.

03

Extensive experiments validate the model's superior capabilities.

Abstract

While proprietary systems such as Seedance-2.0 have achieved remarkable success in omni-capable video generation, open-source alternatives significantly lag behind. Most academic models remain heavily fragmented, and the few existing efforts toward unified video generation still struggle to seamlessly integrate diverse tasks within a single framework. To bridge this gap, we propose OmniWeaving, an omni-level video generation model featuring powerful multimodal composition and reasoning-informed capabilities. By leveraging a massive-scale pretraining dataset that encompasses diverse compositional and reasoning-augmented scenarios, OmniWeaving learns to temporally bind interleaved text, multi-image, and video inputs while acting as an intelligent agent to infer complex user intentions for sophisticated video creation. Furthermore, we introduce IntelligentVBench, the first comprehensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.