RenderWorld: World Model with Self-Supervised 3D Label

Ziyang Yan; Wenzhen Dong; Yihua Shao; Yuhang Lu; Liu Haiyang; Jingwen; Liu; Haozhe Wang; Zhe Wang; Yan Wang; Fabio Remondino; Yuexin Ma

arXiv:2409.11356·cs.CV·February 14, 2025·2 cites

RenderWorld: World Model with Self-Supervised 3D Label

Ziyang Yan, Wenzhen Dong, Yihua Shao, Yuhang Lu, Liu Haiyang, Jingwen, Liu, Haozhe Wang, Zhe Wang, Yan Wang, Fabio Remondino, Yuexin Ma

PDF

Open Access

TL;DR

RenderWorld is a vision-only autonomous driving framework that uses self-supervised 3D labels, Gaussian Splatting, and world modeling to improve segmentation, forecasting, and planning without relying on LiDAR.

Contribution

It introduces a novel self-supervised 3D labeling method and a Gaussian Splatting-based scene representation for end-to-end autonomous driving.

Findings

01

Achieves state-of-the-art 4D occupancy forecasting

02

Improves segmentation accuracy and reduces GPU memory usage

03

Demonstrates robust autonomous driving performance

Abstract

End-to-end autonomous driving with vision-only is not only more cost-effective compared to LiDAR-vision fusion but also more reliable than traditional methods. To achieve a economical and robust purely visual autonomous driving system, we propose RenderWorld, a vision-only end-to-end autonomous driving framework, which generates 3D occupancy labels using a self-supervised gaussian-based Img2Occ Module, then encodes the labels by AM-VAE, and uses world model for forecasting and planning. RenderWorld employs Gaussian Splatting to represent 3D scenes and render 2D images greatly improves segmentation accuracy and reduces GPU memory consumption compared with NeRF-based methods. By applying AM-VAE to encode air and non-air separately, RenderWorld achieves more fine-grained scene element representation, leading to state-of-the-art performance in both 4D occupancy forecasting and motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Modeling in Geospatial Applications