Efficient Image Synthesis with Sphere Latent Encoder
Tung Do, Thuan Hoang Nguyen, Hao Li

TL;DR
This paper introduces a decoupled, spherical latent space approach for image synthesis that enhances efficiency and quality over Sphere Encoder by separating encoding and denoising tasks.
Contribution
It proposes a novel framework with a fixed pretrained encoder and a separate latent denoising model trained in spherical space, improving efficiency and scalability.
Findings
Outperforms Sphere Encoder in quality and speed on multiple datasets
Eliminates repeated pixel-space operations during training and inference
Achieves competitive results with few-step and multi-step baselines
Abstract
Few-step image generation has seen rapid progress, with consistency and meanflow-based methods significantly reducing the number of sampling steps. Despite their low inference cost, these approaches often suffer from training instability and limited scalability. Sphere Encoder is a recent alternative that produces high-quality images in only a few steps; however, it requires repeated transitions between the pixel space and latent space during inference while jointly optimizing reconstruction and generation within a single architecture. This design leads to computational inefficiency and objective conflict between reconstruction and generation. To address these limitations, we decouple the framework into a fixed pretrained image encoder and a separate latent denoising model trained entirely in a spherical latent space. Our approach eliminates repeated pixel-space operations during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
