Visual Implicit Geometry Transformer for Autonomous Driving
Arsenii Shirokov, Mikhail Kuznetsov, Danila Stepochkin, Egor Evdokimov, Daniil Glazkov, Nikolay Patakin, Anton Konushin, Dmitry Senushkin

TL;DR
ViGT is a scalable, calibration-free transformer model that estimates continuous 3D occupancy fields from surround-view cameras for autonomous driving, demonstrating state-of-the-art results across multiple large-scale datasets.
Contribution
Introduces ViGT, a novel self-supervised, calibration-free geometric model for autonomous driving that generalizes across diverse sensor configurations and datasets.
Findings
Achieves state-of-the-art pointmap estimation performance.
Performs comparably to supervised methods on Occ3D-nuScenes.
Demonstrates strong generalization across five large-scale datasets.
Abstract
We introduce the Visual Implicit Geometry Transformer (ViGT), an autonomous driving geometric model that estimates continuous 3D occupancy fields from surround-view camera rigs. ViGT represents a step towards foundational geometric models for autonomous driving, prioritizing scalability, architectural simplicity, and generalization across diverse sensor configurations. Our approach achieves this through a calibration-free architecture, enabling a single model to adapt to different sensor setups. Unlike general-purpose geometric foundational models that focus on pixel-aligned predictions, ViGT estimates a continuous 3D occupancy field in a birds-eye-view (BEV) addressing domain-specific requirements. ViGT naturally infers geometry from multiple camera views into a single metric coordinate frame, providing a common representation for multiple geometric tasks. Unlike most existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
