A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View
Curie Kim, Ue-Hwan Kim

TL;DR
This paper introduces a transformer-based dual-cycle model that unifies road layout estimation and 3D object detection in bird's-eye-view, effectively handling class imbalance and multi-class learning for autonomous driving.
Contribution
It proposes a novel unified model inspired by transformers and CycleGAN, incorporating focal and dual cycle losses to improve multi-task learning under class imbalance.
Findings
Achieves state-of-the-art performance in road layout estimation.
Attains top results in 3D object detection.
Demonstrates robustness across various learning scenarios.
Abstract
The bird's-eye-view (BEV) representation allows robust learning of multiple tasks for autonomous driving including road layout estimation and 3D object detection. However, contemporary methods for unified road layout estimation and 3D object detection rarely handle the class imbalance of the training dataset and multi-class learning to reduce the total number of networks required. To overcome these limitations, we propose a unified model for road layout estimation and 3D object detection inspired by the transformer architecture and the CycleGAN learning framework. The proposed model deals with the performance degradation due to the class imbalance of the dataset utilizing the focal loss and the proposed dual cycle loss. Moreover, we set up extensive learning scenarios to study the effect of multi-class learning for road layout estimation in various situations. To verify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Video Surveillance and Tracking Methods
MethodsHuMan(Expedia)||How do I get a human at Expedia? · Residual Connection · Tanh Activation · Batch Normalization · PatchGAN · *Communicated@Fast*How Do I Communicate to Expedia? · Focal Loss · GAN Least Squares Loss · Residual Block · Convolution
