3D sans 3D Scans: Scalable Pre-training from Video-Generated Point Clouds

Ryousuke Yamada; Kohsuke Ide; Yoshihiro Fukuhara; Hirokatsu Kataoka; Gilles Puy; Andrei Bursuc; Yuki M. Asano

arXiv:2512.23042·cs.CV·March 27, 2026

3D sans 3D Scans: Scalable Pre-training from Video-Generated Point Clouds

Ryousuke Yamada, Kohsuke Ide, Yoshihiro Fukuhara, Hirokatsu Kataoka, Gilles Puy, Andrei Bursuc, Yuki M. Asano

PDF

Open Access 1 Models

TL;DR

This paper introduces a self-supervised learning framework that leverages unlabeled videos to generate 3D point clouds for training, achieving superior indoor segmentation results without using real 3D scans.

Contribution

The work presents LAM3C, a novel self-supervised method utilizing video-generated point clouds and a noise-regularized loss for scalable 3D representation learning from unlabeled videos.

Findings

01

Outperforms previous methods on indoor segmentation tasks

02

Uses only video-generated point clouds without real 3D scans

03

Introduces RoomTours dataset with 49,219 scenes

Abstract

Despite recent progress in 3D self-supervised learning, collecting large-scale 3D scene scans remains expensive and labor-intensive. In this work, we investigate whether 3D representations can be learned from unlabeled videos recorded without any real 3D sensors. We present Laplacian-Aware Multi-level 3D Clustering with Sinkhorn-Knopp (LAM3C), a self-supervised framework that learns from video-generated point clouds reconstructed from unlabeled videos. We first introduce RoomTours, a video-generated point cloud dataset constructed by collecting room-walkthrough videos from the web (e.g., real-estate tours) and generating 49,219 scenes using an off-the-shelf feed-forward reconstruction model. We also propose a noise-regularized loss that stabilizes representation learning by enforcing local geometric smoothness and ensuring feature stability under noisy point clouds. Remarkably, without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
aist-cvrt/lam3c-roomtours
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Human Pose and Action Recognition