TL;DR
This paper introduces a method to extract a structured bird's-eye-view representation of road networks and traffic agents from a single onboard camera image, enhancing scene understanding for autonomous navigation.
Contribution
The work presents a novel approach to derive a directed graph of the road network and detect dynamic objects in BEV from monocular images, which is challenging due to camera mounting orientation.
Findings
Achieves superior performance over baselines
Effectively detects road graph and objects in BEV
Demonstrates robustness through ablation studies
Abstract
Autonomous navigation requires structured representation of the road network and instance-wise identification of the other traffic agents. Since the traffic scene is defined on the ground plane, this corresponds to scene understanding in the bird's-eye-view (BEV). However, the onboard cameras of autonomous cars are customarily mounted horizontally for a better view of the surrounding, making this task very challenging. In this work, we study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image. Moreover, we show that the method can be extended to detect dynamic objects on the BEV plane. The semantics, locations, and orientations of the detected objects together with the road graph facilitates a comprehensive understanding of the scene. Such understanding becomes fundamental for the downstream tasks, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
