OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios
Aditya Nalgunda Ganesh, Dhruval Pobbathi Badrinath, Harshith, Mohan Kumar, Priya SS, Surabhi Narayan

TL;DR
OCTraN is a transformer-based model that converts 2D image features into 3D occupancy maps for autonomous navigation, using self-supervised training to avoid expensive LiDAR data and improve depth accuracy in unstructured traffic scenarios.
Contribution
The paper introduces OCTraN, a novel transformer architecture with iterative-attention for 3D occupancy mapping from monocular images, and a self-supervised training pipeline that eliminates the need for LiDAR ground truth.
Findings
Effective 3D occupancy mapping in unstructured traffic scenes.
Self-supervised training achieves comparable accuracy without LiDAR.
Improved depth estimation accuracy over traditional monocular methods.
Abstract
Modern approaches for vision-centric environment perception for autonomous navigation make extensive use of self-supervised monocular depth estimation algorithms that output disparity maps. However, when this disparity map is projected onto 3D space, the errors in disparity are magnified, resulting in a depth estimation error that increases quadratically as the distance from the camera increases. Though Light Detection and Ranging (LiDAR) can solve this issue, it is expensive and not feasible for many applications. To address the challenge of accurate ranging with low-cost sensors, we propose, OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features and makes use of convolution and transpose convolution to efficiently operate on spatial information. We also develop a self-supervised training pipeline to generalize the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Infrastructure Maintenance and Monitoring
MethodsConvolution
