
TL;DR
This paper proposes a topological framework for visual representation learning, suggesting that understanding requires a semantic language and specific structural model architecture to capture the organization of visual observations.
Contribution
It introduces a novel topological perspective on visual understanding, linking semantic invariance to the structure of the observation space and model architecture requirements.
Findings
Visual observation space has a fiber bundle structure with nuisance and semantic components.
Semantic invariance necessitates non-smooth, discriminative targets like labels or multimodal alignment.
Model architecture must support topology change through expand and snap processes.
Abstract
We study visual representation learning from a structural and topological perspective. We begin from a single hypothesis: that visual understanding presupposes a semantic language for vision, in which many perceptual observations correspond to a small number of discrete semantic states. Together with widely assumed premises on transferability and abstraction in representation learning, this hypothesis implies that the visual observation space must be organized in a fiber bundle like structure, where nuisance variation populates fibers and semantics correspond to a quotient base space. From this structure we derive two theoretical consequences. First, the semantic quotient X/G is not a submanifold of X and cannot be obtained through smooth deformation alone, semantic invariance requires a non homeomorphic, discriminative target for example, supervision via labels, cross-instance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Child and Animal Learning Development · Face Recognition and Perception
