Zooming in on NYC taxi data with Portal
Julia Stoyanovich, Matthew Gilbride, Vera Zaychik Moffitt

TL;DR
This paper introduces a methodology using the Portal system for analyzing NYC taxi data at various granularities, revealing transportation hotspots and popular routes to inform infrastructure improvements.
Contribution
It develops a novel approach combining Portal's evolving graph analysis with transportation data, enabling efficient multi-level analysis of large-scale trajectory datasets.
Findings
Identification of transportation hotspots indicating lack of public transit
Discovery of popular routes suggesting ride-sharing or new bus lines
Efficient analysis of large-scale trajectory data using Portal
Abstract
In this paper we develop a methodology for analyzing transportation data at different levels of temporal and geographic granularity, and apply our methodology to the TLC Trip Record Dataset, made publicly available by the NYC Taxi & Limousine Commission. This data is naturally represented by a set of trajectories, annotated with time and with additional information such as passenger count and cost. We analyze TLC data to identify hotspots, which point to lack of convenient public transportation options, and popular routes, which motivate ride-sharing solutions or addition of a bus route. Our methodology is based on using a system called Portal, which implements efficient representations and principled analysis methods for evolving graphs. Portal is implemented on top of Apache Spark, a popular distributed data processing system, is inter-operable with other Spark libraries like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Human Mobility and Location-Based Analysis · Graph Theory and Algorithms
