Aligning Latent Geometry for Spherical Flow Matching in Image Generation
Tuna Han Salih Meral, Kaan Oktay, Hidir Yesiltepe, Adil Kaan Akan, Pinar Yanardag

TL;DR
This paper introduces a spherical latent space interpolation method for image generation that improves path consistency and quality by aligning latent geometry with the sphere, enhancing class-conditional ImageNet results.
Contribution
It proposes a novel approach that decomposes latents into radial and angular components, using spherical interpolation and fixed-radius projection to improve latent path quality.
Findings
Consistently improves class-conditional ImageNet-256 FID scores.
Maintains diffusion architecture without auxiliary encoders.
Uses spherical linear interpolation for better latent path consistency.
Abstract
Latent flow matching for image generation usually transports Gaussian noise to variational autoencoder latents along linear paths. Both endpoints, however, concentrate in thin spherical shells, and a Euclidean chord leaves those shells even when preprocessing aligns their radii. By decomposing each latent token into radial and angular components, we show through component-swap probes that decoded perceptual and semantic content is carried predominantly by direction, with radius contributing much less. We therefore project data latents onto a fixed token radius, use the radial projection of Gaussian noise as the spherical prior, finetune the decoder with the encoder frozen, and replace linear interpolation with spherical linear interpolation. The resulting geodesic paths stay on the sphere at every timestep, and their velocity targets are purely angular by construction. Under matched…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
