TL;DR
This paper introduces QuadSky, a novel framework for linking spatial entities from multiple sources, combining efficient blocking, Pareto ranking, and innovative classification algorithms to improve accuracy and efficiency in spatial data integration.
Contribution
The paper presents QuadSky, a comprehensive solution for spatial entity linkage that integrates new algorithms and theoretical guarantees, outperforming existing methods in accuracy and efficiency.
Findings
Achieves 0.85 precision and recall on labeled data.
Provides a theoretical guarantee with SkyEx-FES algorithm.
Outperforms existing baselines and clustering techniques.
Abstract
Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities, describe them with different attributes, and sometimes provide contradicting information. Hence, we introduce the spatial entity linkage problem, which finds which pairs of spatial entities belong to the same physical spatial entity. Our proposed solution (QuadSky) starts with a time-efficient spatial blocking technique (QuadFlex), compares pairwise the spatial entities in the same block, ranks the pairs using Pareto optimality with the SkyRank algorithm, and finally, classifies the pairs with our novel SkyEx-* family of algorithms that yield 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
