Gaussian-Constrained LeJEPA Representations for Unsupervised Scene Discovery and Pose Consistency
Mohsen Mostafa

TL;DR
This paper explores Gaussian-constrained image representations inspired by LeJEPA to improve unsupervised scene discovery and pose estimation in complex, ambiguous real-world image collections, demonstrating empirical benefits in clustering and robustness.
Contribution
It introduces a LeJEPA-inspired approach with Gaussian constraints on embeddings, empirically enhancing scene separation and pose accuracy in challenging conditions.
Findings
Gaussian-constrained embeddings improve scene clustering.
Enhanced pose estimation robustness in ambiguous scenarios.
Empirical validation on IMC2025 dataset shows practical benefits.
Abstract
Unsupervised 3D scene reconstruction from unstructured image collections remains a fundamental challenge in computer vision, particularly when images originate from multiple unrelated scenes and contain significant visual ambiguity. The Image Matching Challenge 2025 (IMC2025) highlights these difficulties by requiring both scene discovery and camera pose estimation under real-world conditions, including outliers and mixed content. This paper investigates the application of Gaussian-constrained representations inspired by LeJEPA (Joint Embedding Predictive Architecture) to address these challenges. We present three progressively refined pipelines, culminating in a LeJEPA-inspired approach that enforces isotropic Gaussian constraints on learned image embeddings. Rather than introducing new theoretical guarantees, our work empirically evaluates how these constraints influence clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Human Pose and Action Recognition
