Place Deduplication with Embeddings
Carl Yang, Do Huy Hoang, Tomas Mikolov, Jiawei Han

TL;DR
This paper addresses the challenge of deduplicating places in large, multi-source place graphs by developing a novel embedding-based pipeline that significantly improves over existing methods.
Contribution
It formulates the place deduplication problem, explores related tasks, and proposes a systematic, data-driven embedding pipeline with novel techniques that outperforms current state-of-the-art solutions.
Findings
The proposed method achieves higher accuracy than existing approaches.
The embedding pipeline effectively captures place similarities.
Significant improvements demonstrated on Facebook's place graph data.
Abstract
Thanks to the advancing mobile location services, people nowadays can post about places to share visiting experience on-the-go. A large place graph not only helps users explore interesting destinations, but also provides opportunities for understanding and modeling the real world. To improve coverage and flexibility of the place graph, many platforms import places data from multiple sources, which unfortunately leads to the emergence of numerous duplicated places that severely hinder subsequent location-related services. In this work, we take the anonymous place graph from Facebook as an example to systematically study the problem of place deduplication: We carefully formulate the problem, study its connections to various related tasks that lead to several promising basic models, and arrive at a systematic two-step data-driven pipeline based on place embedding with multiple novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Mobility and Location-Based Analysis · Sharing Economy and Platforms · Mobile Crowdsensing and Crowdsourcing
