SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
Haiwen Diao, Penghao Wu, Hanming Deng, Jiahao Wang, Shihao Bai, Silei Wu, Weichen Fan, Wenjie Ye, Wenwen Tong, Xiangyu Fan, Yan Li, Yubo Wang, Zhijie Cao, Zhiqian Lin, Zhitao Yang, Zhongang Cai, Yuwei Niu, Yue Zhu, Bo Liu, Chengguang Lv, Haojia Yu, Haozhe Xie, Hongli Wang

TL;DR
SenseNova-U1 introduces a unified multimodal model that integrates understanding and generation, achieving top-tier performance and enabling native multimodal reasoning and action.
Contribution
The paper presents a novel unified architecture, SenseNova-U1, that combines understanding and generation in a single model, advancing beyond traditional separate systems.
Findings
Rivals top-tier understanding-only vision-language models in multiple tasks.
Excels in complex image synthesis and infographic generation.
Extends to vision-language-action and world model scenarios.
Abstract
Recent large vision-language models (VLMs) remain fundamentally constrained by a persistent dichotomy: understanding and generation are treated as distinct problems, leading to fragmented architectures, cascaded pipelines, and misaligned representation spaces. We argue that this divide is not merely an engineering artifact, but a structural limitation that hinders the emergence of native multimodal intelligence. Hence, we introduce SenseNova-U1, a native unified multimodal paradigm built upon NEO-unify, in which understanding and generation evolve as synergistic views of a single underlying process. We launch two native unified variants, SenseNova-U1-8B-MoT and SenseNova-U1-A3B-MoT, built on dense (8B) and mixture-of-experts (30B-A3B) understanding baselines, respectively. Designed from first principles, they rival top-tier understanding-only VLMs across text understanding,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗sensenova/SenseNova-U1-8B-MoTmodel· 19k dl· ♡ 27419k dl♡ 274
- 🤗sensenova/SenseNova-U1-A3B-MoTmodel· 216 dl· ♡ 14216 dl♡ 14
- 🤗sensenova/SenseNova-U1-8B-MoT-SFTmodel· 1.9k dl· ♡ 511.9k dl♡ 51
- 🤗sensenova/SenseNova-U1-A3B-MoT-SFTmodel· 154 dl· ♡ 8154 dl♡ 8
- 🤗Jiahui347014527/SenseNova-U1-A3B-MoTmodel· 11 dl11 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
