Image Generation with a Sphere Encoder
Kaiyu Yue, Menglin Jia, Ji Hou, Tom Goldstein

TL;DR
The paper presents the Sphere Encoder, a fast and efficient image generation framework that produces high-quality images in a single or few steps by mapping images onto a spherical latent space, competing with diffusion models.
Contribution
Introduces the Sphere Encoder, a novel spherical latent space approach for fast image generation with competitive quality and reduced inference cost.
Findings
Achieves image generation in fewer than five steps.
Performs comparably to state-of-the-art diffusion models.
Supports conditional image generation and quality enhancement through looping.
Abstract
We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · 3D Shape Modeling and Analysis
