TL;DR
This paper presents an automated method for detecting objects and determining their precise GPS locations from street view images using neural networks and a novel triangulation approach, enhancing urban mapping accuracy.
Contribution
It introduces a new pipeline combining monocular depth estimation and triangulation with a custom Markov Random Field for accurate object geolocation from street imagery.
Findings
High object recall rates achieved
GPS accuracy within 2 meters
Effective for multiple object classes like traffic lights and telegraph poles
Abstract
Many applications such as autonomous navigation, urban planning and asset monitoring, rely on the availability of accurate information about objects and their geolocations. In this paper we propose to automatically detect and compute the GPS coordinates of recurring stationary objects of interest using street view imagery. Our processing pipeline relies on two fully convolutional neural networks: the first segments objects in the images while the second estimates their distance from the camera. To geolocate all the detected objects coherently we propose a novel custom Markov Random Field model to perform objects triangulation. The novelty of the resulting pipeline is the combined use of monocular depth estimation and triangulation to enable automatic mapping of complex scenes with multiple visually similar objects of interest. We validate experimentally the effectiveness of our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
