Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

Yuqing Wang; Chuofan Ma; Zhijie Lin; Yao Teng; Lijun Yu; Shuai Wang; Jiaming Han; Jiashi Feng; Yi Jiang; Xihui Liu

arXiv:2603.19232·cs.CV·March 20, 2026

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

Yuqing Wang, Chuofan Ma, Zhijie Lin, Yao Teng, Lijun Yu, Shuai Wang, Jiaming Han, Jiashi Feng, Yi Jiang, Xihui Liu

PDF

Open Access

TL;DR

CubiD introduces a novel discrete diffusion model capable of high-dimensional visual token generation, enabling rich semantic understanding and generation in multimodal architectures, demonstrated on ImageNet-256 with state-of-the-art results.

Contribution

This work pioneers discrete generation for high-dimensional representations, enabling fine-grained masking and prediction across all dimensions and positions, with fixed steps independent of feature size.

Findings

01

Achieves state-of-the-art discrete generation on ImageNet-256.

02

Demonstrates effective scaling from 900M to 3.7B parameters.

03

Preserves original representation capabilities for understanding and generation.

Abstract

Visual generation with discrete tokens has gained significant attention as it enables a unified token prediction paradigm shared with language models, promising seamless multimodal architectures. However, current discrete generation methods remain limited to low-dimensional latent tokens (typically 8-32 dims), sacrificing the semantic richness essential for understanding. While high-dimensional pretrained representations (768-1024 dims) could bridge this gap, their discrete generation poses fundamental challenges. In this paper, we present Cubic Discrete Diffusion (CubiD), the first discrete generation model for high-dimensional representations. CubiD performs fine-grained masking throughout the high-dimensional discrete representation -- any dimension at any position can be masked and predicted from partial observations. This enables the model to learn rich correlations both within and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning