Learning to Look around Objects for Top-View Representations of Outdoor Scenes
Samuel Schulter, Menghua Zhai, Nathan Jacobs, Manmohan Chandraker

TL;DR
This paper introduces a neural network approach to estimate occlusion-aware top-view scene layouts from a single RGB image of outdoor scenes, improving understanding of both visible and occluded areas without requiring extensive annotations.
Contribution
The method predicts occluded scene semantics and depths directly, enhancing top-view transformation and leveraging priors from map data without needing costly annotations.
Findings
Effective occlusion reasoning improves top-view scene understanding.
Model achieves accurate semantic and depth predictions in occluded regions.
Approach outperforms baseline methods on KITTI and Cityscapes datasets.
Abstract
Given a single RGB image of a complex outdoor road scene in the perspective view, we address the novel problem of estimating an occlusion-reasoned semantic scene layout in the top-view. This challenging problem not only requires an accurate understanding of both the 3D geometry and the semantics of the visible scene, but also of occluded areas. We propose a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians. But instead of hallucinating RGB values, we show that directly predicting the semantics and depths in the occluded areas enables a better transformation into the top-view. We further show that this initial top-view representation can be significantly enhanced by learning priors and rules about typical road layouts from simulated or, if available, map data. Crucially, training our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
