Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo, Fengyuan Shi, Yixiao Ge, Yujiu Yang, Limin Wang, Ying, Shan

TL;DR
Open-MAGVIT2 is an open-source project that replicates Google's large-vocabulary tokenizer, achieving state-of-the-art image reconstruction and enabling scalable auto-regressive visual generation with improved quality.
Contribution
It introduces an open-source large-vocabulary tokenizer, explores its application in scalable auto-regressive models, and proposes novel token factorization techniques for better image generation.
Findings
Achieves state-of-the-art reconstruction on ImageNet and UCF.
Outperforms Cosmos in zero-shot benchmarks.
Provides scalable auto-regressive models from 300M to 1.5B parameters.
Abstract
The Open-MAGVIT2 project produces an open-source replication of Google's MAGVIT-v2 tokenizer, a tokenizer with a super-large codebook (i.e., codes), and achieves the state-of-the-art reconstruction performance on ImageNet and UCF benchmarks. We also provide a tokenizer pre-trained on large-scale data, significantly outperforming Cosmos on zero-shot benchmarks (1.93 vs. 0.78 rFID on ImageNet original resolution). Furthermore, we explore its application in plain auto-regressive models to validate scalability properties, producing a family of auto-regressive image generation models ranging from 300M to 1.5B. To assist auto-regressive models in predicting with a super-large vocabulary, we factorize it into two sub-vocabulary of different sizes by asymmetric token factorization, and further introduce ``next sub-token prediction'' to enhance sub-token interaction for better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗TencentARC/Open-MAGVIT2model· ♡ 14♡ 14
- 🤗TencentARC/Open-MAGVIT2-Tokenizer-128-resolutionmodel· 9 dl· ♡ 19 dl♡ 1
- 🤗TencentARC/Open-MAGVIT2-Tokenizer-256-resolutionmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗TencentARC/Open-MAGVIT2-AR-B-256-resolutionmodel
- 🤗TencentARC/Open-MAGVIT2-AR-L-256-resolutionmodel· 1 dl1 dl
- 🤗TencentARC/Open-MAGVIT2-AR-XL-256-resolutionmodel· ♡ 1♡ 1
- 🤗TencentARC/Open-MAGVIT2-Tokenizer-262144-Pretrainmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗TencentARC/Open-MAGVIT2-Tokenizer-16384-Pretrainmodel· 4 dl· ♡ 24 dl♡ 2
- 🤗TencentARC/Open-MAGVIT2-Tokenizer-262144-Videomodel· 1 dl· ♡ 11 dl♡ 1
- 🤗GrayShine/WeTokmodel· ♡ 2♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParticipatory Visual Research Methods
