Hyperspherical Autoencoder for High-Fidelity Image Reconstruction and Generation
Hun Chang, Byunghee Cha, Jong Chul Ye

TL;DR
The paper introduces Hyperspherical Autoencoder (HAE), a novel framework that improves high-fidelity image reconstruction and generation by leveraging hyperspherical representations and a manifold-aware diffusion model.
Contribution
It proposes a Directional Feature Alignment objective and Hierarchical Convolutional Patch Embedding to enhance detail preservation, along with a Riemannian Flow Matching method for efficient training on hyperspherical latent spaces.
Findings
Achieved a gFID of 1.96, rFID of 0.78, and PSNR of 25.2 dB.
Demonstrated efficient convergence of the manifold-aware Diffusion Transformer.
Enhanced local structure preservation and semantic consistency in reconstructions.
Abstract
Recent studies have explored using pretrained Vision Foundation Models (VFMs) such as DINO for generative autoencoders, showing strong generative performance. Unfortunately, existing approaches often suffer from limited reconstruction fidelity due to the loss of high-frequency details. In this work, we present the \textbf{\em Hyperspherical Autoencoder (HAE)}, a framework that bridges semantic representation and pixel-level reconstruction. Our key insight is that while semantic information in contrastive representations is primarily directional, enforcing strict magnitude matching hinders the preservation of fine-grained details. To address this, we introduce a {\em Directional Feature Alignment} objective that enforces semantic consistency while allowing flexible feature magnitudes for detail retention, alongside a {\em Hierarchical Convolutional Patch Embedding} module to enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
