RenderWorld: World Model with Self-Supervised 3D Label
Ziyang Yan, Wenzhen Dong, Yihua Shao, Yuhang Lu, Liu Haiyang, Jingwen, Liu, Haozhe Wang, Zhe Wang, Yan Wang, Fabio Remondino, Yuexin Ma

TL;DR
RenderWorld is a vision-only autonomous driving framework that uses self-supervised 3D labels, Gaussian Splatting, and world modeling to improve segmentation, forecasting, and planning without relying on LiDAR.
Contribution
It introduces a novel self-supervised 3D labeling method and a Gaussian Splatting-based scene representation for end-to-end autonomous driving.
Findings
Achieves state-of-the-art 4D occupancy forecasting
Improves segmentation accuracy and reduces GPU memory usage
Demonstrates robust autonomous driving performance
Abstract
End-to-end autonomous driving with vision-only is not only more cost-effective compared to LiDAR-vision fusion but also more reliable than traditional methods. To achieve a economical and robust purely visual autonomous driving system, we propose RenderWorld, a vision-only end-to-end autonomous driving framework, which generates 3D occupancy labels using a self-supervised gaussian-based Img2Occ Module, then encodes the labels by AM-VAE, and uses world model for forecasting and planning. RenderWorld employs Gaussian Splatting to represent 3D scenes and render 2D images greatly improves segmentation accuracy and reduces GPU memory consumption compared with NeRF-based methods. By applying AM-VAE to encode air and non-air separately, RenderWorld achieves more fine-grained scene element representation, leading to state-of-the-art performance in both 4D occupancy forecasting and motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications
