NativeTok: Native Visual Tokenization for Improved Image Generation

Bin Wu; Mengqi Huang; Weinan Jia; Zhendong Mao

arXiv:2601.22837·cs.CV·February 2, 2026

NativeTok: Native Visual Tokenization for Improved Image Generation

Bin Wu, Mengqi Huang, Weinan Jia, Zhendong Mao

PDF

Open Access

TL;DR

NativeTok introduces a novel visual tokenization method that enforces causal dependencies, leading to improved image generation coherence by embedding relational constraints directly into token sequences.

Contribution

It proposes native visual tokenization with causal dependencies, and a new framework NativeTok combining MIT and MoCET for efficient, constrained image tokenization and generation.

Findings

01

Enhanced image reconstruction quality.

02

Better coherence in generated images.

03

Efficient training with Hierarchical Native Training.

Abstract

VQ-based image generation typically follows a two-stage pipeline: a tokenizer encodes images into discrete tokens, and a generative model learns their dependencies for reconstruction. However, improved tokenization in the first stage does not necessarily enhance the second-stage generation, as existing methods fail to constrain token dependencies. This mismatch forces the generative model to learn from unordered distributions, leading to bias and weak coherence. To address this, we propose native visual tokenization, which enforces causal dependencies during tokenization. Building on this idea, we introduce NativeTok, a framework that achieves efficient reconstruction while embedding relational constraints within token sequences. NativeTok consists of: (1) a Meta Image Transformer (MIT) for latent image modeling, and (2) a Mixture of Causal Expert Transformer (MoCET), where each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Cell Image Analysis Techniques