Emerging Properties in Unified Multimodal Pretraining
Chaorui Deng, Deyao Zhu, Kunchang Li, Chenhui Gou, Feng Li, Zeyu Wang, Shu Zhong, Weihao Yu, Xiaonan Nie, Ziang Song, Guang Shi, Haoqi Fan

TL;DR
BAGEL is an open-source, unified multimodal model pretrained on diverse data, demonstrating emerging complex reasoning and outperforming existing open models in understanding and generation tasks.
Contribution
Introduction of BAGEL, a decoder-only multimodal model supporting understanding and generation, with detailed pretraining methods and open-source release.
Findings
BAGEL exhibits advanced multimodal reasoning abilities.
It significantly outperforms open-source models on standard benchmarks.
Demonstrates capabilities like image manipulation and future frame prediction.
Abstract
Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems. In this work, we introduce BAGEL, an open-source foundational model that natively supports multimodal understanding and generation. BAGEL is a unified, decoder-only model pretrained on trillions of tokens curated from large-scale interleaved text, image, video, and web data. When scaled with such diverse multimodal interleaved data, BAGEL exhibits emerging capabilities in complex multimodal reasoning. As a result, it significantly outperforms open-source unified models in both multimodal generation and understanding across standard benchmarks, while exhibiting advanced multimodal reasoning abilities such as free-form image manipulation, future frame prediction, 3D manipulation, and world navigation. In the hope of facilitating further opportunities for multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ByteDance-Seed/BAGEL-7B-MoTmodel· 8.9k dl· ♡ 11858.9k dl♡ 1185
- 🤗Azily/Macro-Bagelmodel· 15 dl· ♡ 115 dl♡ 1
- 🤗Gapeleon/bytedance_BAGEL-7B-MoT-INT8model· 31 dl· ♡ 2431 dl♡ 24
- 🤗zanchat-ai/fast-bagelmodel· 1 dl1 dl
- 🤗JiaxinGe/Diffusers-BAGELmodel· 4 dl· ♡ 64 dl♡ 6
- 🤗joshmiao/bagel_mvotmodel· 1 dl1 dl
- 🤗Zillis/ByteDance_Seed_BAGEL_7B_MoTmodel
- 🤗Ziruibest/SafeUMMmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗exptest2/s1_onlyt2i_14kemamodel· 2 dl2 dl
- 🤗HappyCorpse/Bagel_harmful_medicalmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
