Where is this? Video geolocation based on neural network features
Salvador Medina, Zhuyun Dai, Yingkai Gao

TL;DR
This paper introduces a neural network-based video geolocation method that uses image retrieval and voting techniques to accurately locate videos within a city area, achieving high precision.
Contribution
It presents a novel voting-based aggregation method combining deep learning features and traditional image similarity for improved video geolocation accuracy.
Findings
Achieved 90% precision within 150 meters
Developed a new Pittsburgh Downtown video dataset
Demonstrated effectiveness of combined NetVLAD and SIFT features
Abstract
In this work we propose a method that geolocates videos within a delimited widespread area based solely on the frames visual content. Our proposed method tackles video-geolocation through traditional image retrieval techniques considering Google Street View as the reference point. To achieve this goal we use the deep learning features obtained from NetVLAD to represent images, since through this feature vectors the similarity is their L2 norm. In this paper, we propose a family of voting-based methods to aggregate frame-wise geolocation results which boost the video geolocation result. The best aggregation found through our experiments considers both NetVLAD and SIFT similarity, as well as the geolocation density of the most similar results. To test our proposed method, we gathered a new video dataset from Pittsburgh Downtown area to benefit and stimulate more work in this area. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
