A Comprehensive Evaluation Framework for Synthetic Trip Data Generation in Public Transport
Yuanyuan Wu, Zhenlin Qin, Zhenliang Ma

TL;DR
This paper introduces a comprehensive evaluation framework for synthetic public transport trip data, assessing representativeness, privacy, and utility across multiple levels, and benchmarks various generation methods to guide practical applications.
Contribution
It proposes the RPU framework for systematic, multi-level evaluation of synthetic trip data, addressing gaps in existing fragmented assessment methods.
Findings
Synthetic data do not inherently guarantee privacy.
No single model outperforms others across all metrics.
CTGAN offers the best privacy-utility trade-off.
Abstract
Synthetic data offers a promising solution to the privacy and accessibility challenges of using smart card data in public transport research. Despite rapid progress in generative modeling, there is limited attention to comprehensive evaluation, leaving unclear how reliable, safe, and useful synthetic data truly are. Existing evaluations remain fragmented, typically limited to population-level representativeness or record-level privacy, without considering group-level variations or task-specific utility. To address this gap, we propose a Representativeness-Privacy-Utility (RPU) framework that systematically evaluates synthetic trip data across three complementary dimensions and three hierarchical levels (record, group, population). The framework integrates a consistent set of metrics to quantify similarity, disclosure risk, and practical usefulness, enabling transparent and balanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
