Hyperspherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Hun Chang; Byunghee Cha; Jong Chul Ye

arXiv:2601.22904·cs.CV·May 12, 2026

Hyperspherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Hun Chang, Byunghee Cha, Jong Chul Ye

PDF

TL;DR

The paper introduces Hyperspherical Autoencoder (HAE), a novel framework that improves high-fidelity image reconstruction and generation by leveraging hyperspherical representations and a manifold-aware diffusion model.

Contribution

It proposes a Directional Feature Alignment objective and Hierarchical Convolutional Patch Embedding to enhance detail preservation, along with a Riemannian Flow Matching method for efficient training on hyperspherical latent spaces.

Findings

01

Achieved a gFID of 1.96, rFID of 0.78, and PSNR of 25.2 dB.

02

Demonstrated efficient convergence of the manifold-aware Diffusion Transformer.

03

Enhanced local structure preservation and semantic consistency in reconstructions.

Abstract

Recent studies have explored using pretrained Vision Foundation Models (VFMs) such as DINO for generative autoencoders, showing strong generative performance. Unfortunately, existing approaches often suffer from limited reconstruction fidelity due to the loss of high-frequency details. In this work, we present the \textbf{\em Hyperspherical Autoencoder (HAE)}, a framework that bridges semantic representation and pixel-level reconstruction. Our key insight is that while semantic information in contrastive representations is primarily directional, enforcing strict magnitude matching hinders the preservation of fine-grained details. To address this, we introduce a {\em Directional Feature Alignment} objective that enforces semantic consistency while allowing flexible feature magnitudes for detail retention, alongside a {\em Hierarchical Convolutional Patch Embedding} module to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.