TL;DR
Generative Refinement Networks (GRN) introduce a near-lossless hierarchical binary quantization and a global refinement mechanism, enabling efficient, high-quality visual synthesis across various tasks.
Contribution
GRN combines a near-lossless hierarchical binary quantization with a global refinement process and entropy-guided sampling, advancing autoregressive models for visual generation.
Findings
Achieved 0.56 rFID in image reconstruction on ImageNet.
Set new records with 1.81 gFID in class-conditional image generation.
Demonstrated superior performance in text-to-image and text-to-video tasks.
Abstract
While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. In contrast, autoregressive (AR) models are inherently complexity-aware, as evidenced by their variable likelihoods, but are often hindered by lossy discrete tokenization and error accumulation. In this work, we introduce Generative Refinement Networks (GRN), a next-generation visual synthesis paradigm to address these issues. At its core, GRN addresses the discrete tokenization bottleneck through a theoretically near-lossless Hierarchical Binary Quantization (HBQ), achieving a reconstruction quality comparable to continuous counterparts. Built upon HBQ's latent space, GRN fundamentally upgrades AR generation with a global refinement mechanism that progressively perfects and corrects artworks -- like a human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
