HEART: Hyperspherical Embedding Alignment via Kent-Representation Traversal in Diffusion Models
Arani Roy, Shristi Das Biswas, Kaushik Roy

TL;DR
HEART introduces a geometry-aware approach using hyperspherical embeddings and Kent distributions to improve control over diffusion model outputs without fine-tuning.
Contribution
This work reveals the hyperspherical structure of text embeddings and proposes a Kent-distribution-based method for more precise, geometry-respecting image editing in diffusion models.
Findings
Enables consistent subject replacement and attribute control
Requires no finetuning, inversion, or optimization
Generalizes across different diffusion architectures
Abstract
Text-to-image diffusion models can generate visually stunning images, yet, controlling what appears and how it appears, remains surprisingly difficult, especially when operating solely within the constraints of the text-conditioning space. For example, changing a subject or adjusting an attribute often leads to unintended side effects, such as altered backgrounds or distorted details. This is because most existing text-based control methods treat the embedding space as Euclidean and apply simple linear transformations, which do not reflect how semantic concepts are actually organized. In this work, we take a step back and ask: what is the true geometry of these embeddings? We find that text encoder representations lie on a hypersphere, where concepts are not linear directions but structured, anisotropic distributions better captured by Kent distributions. Building on this insight, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
