Self-Improving 4D Perception via Self-Distillation

Nan Huang; Pengcheng Yu; Weijia Zeng; James M. Rehg; Angjoo Kanazawa; Haiwen Feng; Qianqian Wang

arXiv:2604.08532·cs.CV·April 10, 2026

Self-Improving 4D Perception via Self-Distillation

Nan Huang, Pengcheng Yu, Weijia Zeng, James M. Rehg, Angjoo Kanazawa, Haiwen Feng, Qianqian Wang

PDF

2 Repos

TL;DR

SelfEvo is a self-distillation framework that enhances multi-view 4D perception models using unlabeled videos, achieving significant improvements without external annotations.

Contribution

It introduces a novel self-distillation scheme leveraging spatiotemporal asymmetry for self-improvement in 4D perception models without labeled data.

Findings

01

Up to 36.5% improvement in video depth estimation.

02

Up to 20.1% improvement in camera estimation.

03

Consistent performance gains across diverse datasets and models.

Abstract

Large-scale multi-view reconstruction models have made remarkable progress, but most existing approaches still rely on fully supervised training with ground-truth 3D/4D annotations. Such annotations are expensive and particularly scarce for dynamic scenes, limiting scalability. We propose SelfEvo, a self-improving framework that continually improves pretrained multi-view reconstruction models using unlabeled videos. SelfEvo introduces a self-distillation scheme using spatiotemporal context asymmetry, enabling self-improvement for learning-based 4D perception without external annotations. We systematically study design choices that make self-improvement effective, including loss signals, forms of asymmetry, and other training strategies. Across eight benchmarks spanning diverse datasets and domains, SelfEvo consistently improves pretrained baselines and generalizes across base models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.