E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training

Qitao Zhao; Hao Tan; Qianqian Wang; Sai Bi; Kai Zhang; Kalyan Sunkavalli; Shubham Tulsiani; Hanwen Jiang

arXiv:2512.10950·cs.CV·March 31, 2026

E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training

Qitao Zhao, Hao Tan, Qianqian Wang, Sai Bi, Kai Zhang, Kalyan Sunkavalli, Shubham Tulsiani, Hanwen Jiang

PDF

1 Repo

TL;DR

E-RayZer introduces a self-supervised 3D vision model that learns geometrically grounded representations directly from unlabeled multi-view images, outperforming prior methods and existing pre-trained models on 3D tasks.

Contribution

E-RayZer is the first to perform direct 3D self-supervised reconstruction with explicit geometry, improving 3D-aware representations without supervision.

Findings

01

E-RayZer outperforms RayZer on pose estimation.

02

It matches or surpasses supervised models like VGGT.

03

Its representations outperform leading visual pre-training models on 3D tasks.

Abstract

Self-supervised pre-training has driven rapid progress in foundation models for language, 2D images, and video, yet remains largely unexplored for learning 3D-aware representations from multi-view images. In this paper, we present E-RayZer, a self-supervised 3D vision model that learns geometrically grounded representations directly from unlabeled images. Unlike prior self-supervised methods such as RayZer, which infer 3D indirectly through latent-space view synthesis, E-RayZer operates directly in 3D space, performing self-supervised 3D reconstruction with Explicit geometry. This formulation eliminates shortcut solutions and yields representations that are 3D-aware. To ensure convergence and scalability, we introduce a fine-grained learning curriculum that organizes training from easy to hard samples and harmonizes heterogeneous data sources without any supervision. Experiments show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qitaozhao/E-RayZer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.