Find Another Me Across the World -- Large-scale Semantic Trajectory Analysis Using Spark
Chaoquan Cai, Dan Lin

TL;DR
This paper introduces a scalable Spark-based algorithm for semantic trajectory analysis, enabling rapid similarity computation and community detection across large datasets, significantly outperforming centralized methods in speed.
Contribution
A novel distributed algorithm with a new hash function for semantic trajectory analysis in Spark, capable of handling large-scale data efficiently.
Findings
Over 30 times faster than centralized approaches
Maintains accuracy comparable to non-parallel methods
Effective in identifying global communities based on trajectory similarity
Abstract
In today's society, location-based services are widely used which collect a huge amount of human trajectories. Analyzing semantic meanings of these trajectories can benefit numerous real-world applications, such as product advertisement, friend recommendation, and social behavior analysis. However, existing works on semantic trajectories are mostly centralized approaches that are not able to keep up with the rapidly growing trajectory collections. In this paper, we propose a novel large-scale semantic trajectory analysis algorithm in Apache Spark. We design a new hash function along with efficient distributed algorithms that can quickly compute semantic trajectory similarities and identify communities of people with similar behavior across the world. The experimental results show that our approach is more than 30 times faster than centralized approaches without sacrificing any accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Human Mobility and Location-Based Analysis · Geographic Information Systems Studies
