OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured   Traffic Scenarios

Aditya Nalgunda Ganesh; Dhruval Pobbathi Badrinath; Harshith; Mohan Kumar; Priya SS; Surabhi Narayan

arXiv:2307.10934·cs.CV·July 21, 2023·1 cites

OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios

Aditya Nalgunda Ganesh, Dhruval Pobbathi Badrinath, Harshith, Mohan Kumar, Priya SS, Surabhi Narayan

PDF

Open Access 2 Datasets

TL;DR

OCTraN is a transformer-based model that converts 2D image features into 3D occupancy maps for autonomous navigation, using self-supervised training to avoid expensive LiDAR data and improve depth accuracy in unstructured traffic scenarios.

Contribution

The paper introduces OCTraN, a novel transformer architecture with iterative-attention for 3D occupancy mapping from monocular images, and a self-supervised training pipeline that eliminates the need for LiDAR ground truth.

Findings

01

Effective 3D occupancy mapping in unstructured traffic scenes.

02

Self-supervised training achieves comparable accuracy without LiDAR.

03

Improved depth estimation accuracy over traditional monocular methods.

Abstract

Modern approaches for vision-centric environment perception for autonomous navigation make extensive use of self-supervised monocular depth estimation algorithms that output disparity maps. However, when this disparity map is projected onto 3D space, the errors in disparity are magnified, resulting in a depth estimation error that increases quadratically as the distance from the camera increases. Though Light Detection and Ranging (LiDAR) can solve this issue, it is expensive and not feasible for many applications. To address the challenge of accurate ranging with low-cost sensors, we propose, OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features and makes use of convolution and transpose convolution to efficiently operate on spatial information. We also develop a self-supervised training pipeline to generalize the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Infrastructure Maintenance and Monitoring

MethodsConvolution