A Dual-Cycled Cross-View Transformer Network for Unified Road Layout   Estimation and 3D Object Detection in the Bird's-Eye-View

Curie Kim; Ue-Hwan Kim

arXiv:2209.08844·cs.CV·September 20, 2022·1 cites

A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View

Curie Kim, Ue-Hwan Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces a transformer-based dual-cycle model that unifies road layout estimation and 3D object detection in bird's-eye-view, effectively handling class imbalance and multi-class learning for autonomous driving.

Contribution

It proposes a novel unified model inspired by transformers and CycleGAN, incorporating focal and dual cycle losses to improve multi-task learning under class imbalance.

Findings

01

Achieves state-of-the-art performance in road layout estimation.

02

Attains top results in 3D object detection.

03

Demonstrates robustness across various learning scenarios.

Abstract

The bird's-eye-view (BEV) representation allows robust learning of multiple tasks for autonomous driving including road layout estimation and 3D object detection. However, contemporary methods for unified road layout estimation and 3D object detection rarely handle the class imbalance of the training dataset and multi-class learning to reduce the total number of networks required. To overcome these limitations, we propose a unified model for road layout estimation and 3D object detection inspired by the transformer architecture and the CycleGAN learning framework. The proposed model deals with the performance degradation due to the class imbalance of the dataset utilizing the focal loss and the proposed dual cycle loss. Moreover, we set up extensive learning scenarios to study the effect of multi-class learning for road layout estimation in various situations. To verify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AutoCompSysLab/DCTNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Video Surveillance and Tracking Methods

MethodsHuMan(Expedia)||How do I get a human at Expedia? · Residual Connection · Tanh Activation · Batch Normalization · PatchGAN · *Communicated@Fast*How Do I Communicate to Expedia? · Focal Loss · GAN Least Squares Loss · Residual Block · Convolution