Gaussian-Constrained LeJEPA Representations for Unsupervised Scene Discovery and Pose Consistency

Mohsen Mostafa

arXiv:2602.07016·cs.CV·February 10, 2026

Gaussian-Constrained LeJEPA Representations for Unsupervised Scene Discovery and Pose Consistency

Mohsen Mostafa

PDF

Open Access

TL;DR

This paper explores Gaussian-constrained image representations inspired by LeJEPA to improve unsupervised scene discovery and pose estimation in complex, ambiguous real-world image collections, demonstrating empirical benefits in clustering and robustness.

Contribution

It introduces a LeJEPA-inspired approach with Gaussian constraints on embeddings, empirically enhancing scene separation and pose accuracy in challenging conditions.

Findings

01

Gaussian-constrained embeddings improve scene clustering.

02

Enhanced pose estimation robustness in ambiguous scenarios.

03

Empirical validation on IMC2025 dataset shows practical benefits.

Abstract

Unsupervised 3D scene reconstruction from unstructured image collections remains a fundamental challenge in computer vision, particularly when images originate from multiple unrelated scenes and contain significant visual ambiguity. The Image Matching Challenge 2025 (IMC2025) highlights these difficulties by requiring both scene discovery and camera pose estimation under real-world conditions, including outliers and mixed content. This paper investigates the application of Gaussian-constrained representations inspired by LeJEPA (Joint Embedding Predictive Architecture) to address these challenges. We present three progressively refined pipelines, culminating in a LeJEPA-inspired approach that enforces isotropic Gaussian constraints on learned image embeddings. Rather than introducing new theoretical guarantees, our work empirically evaluates how these constraints influence clustering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Human Pose and Action Recognition