Efficient Image Synthesis with Sphere Latent Encoder

Tung Do; Thuan Hoang Nguyen; Hao Li

arXiv:2605.15592·cs.CV·May 18, 2026

Efficient Image Synthesis with Sphere Latent Encoder

Tung Do, Thuan Hoang Nguyen, Hao Li

PDF

TL;DR

This paper introduces a decoupled, spherical latent space approach for image synthesis that enhances efficiency and quality over Sphere Encoder by separating encoding and denoising tasks.

Contribution

It proposes a novel framework with a fixed pretrained encoder and a separate latent denoising model trained in spherical space, improving efficiency and scalability.

Findings

01

Outperforms Sphere Encoder in quality and speed on multiple datasets

02

Eliminates repeated pixel-space operations during training and inference

03

Achieves competitive results with few-step and multi-step baselines

Abstract

Few-step image generation has seen rapid progress, with consistency and meanflow-based methods significantly reducing the number of sampling steps. Despite their low inference cost, these approaches often suffer from training instability and limited scalability. Sphere Encoder is a recent alternative that produces high-quality images in only a few steps; however, it requires repeated transitions between the pixel space and latent space during inference while jointly optimizing reconstruction and generation within a single architecture. This design leads to computational inefficiency and objective conflict between reconstruction and generation. To address these limitations, we decouple the framework into a fixed pretrained image encoder and a separate latent denoising model trained entirely in a spherical latent space. Our approach eliminates repeated pixel-space operations during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.