TL;DR
This paper demonstrates that projecting object-centric scene representations into hyperbolic space reveals latent hierarchical structures not visible in Euclidean space, with implications for improved scene understanding.
Contribution
It introduces a post-hoc hyperbolic projection pipeline for slot attention embeddings, uncovering hierarchical scene structures and analyzing the effects of hyperbolic curvature.
Findings
Hyperbolic projection reveals scene-object hierarchy in slot representations.
Coarse slots occupy greater hyperbolic depth than fine slots.
Different curvatures affect hierarchical separation and retrieval performance.
Abstract
Slot attention has emerged as a powerful framework for unsupervised object-centric learning, decomposing visual scenes into a small set of compact vector representations called \emph{slots}, each capturing a distinct region or object. However, these slots are learned in Euclidean space, which provides no geometric inductive bias for the hierarchical relationships that naturally structure visual scenes. In this work, we propose a simple post-hoc pipeline to project Euclidean slot embeddings onto the Lorentz hyperboloid of hyperbolic space, without modifying the underlying training pipeline. We construct five-level visual hierarchies directly from slot attention masks and analyse whether hyperbolic geometry reveals latent hierarchical structure that remains invisible in Euclidean space. Integrating our pipeline with SPOT (images), VideoSAUR (video), and SlotContrast (video), We find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
