Robust Taxi Fare Prediction Under Noisy Conditions: A Comparative Study of GAT, TimesNet, and XGBoost
Padmavathi Moorthy (SUNY Buffalo)

TL;DR
This paper compares GAT, TimesNet, and XGBoost for taxi fare prediction using noisy and denoised data, revealing their robustness and accuracy differences in real-world urban mobility scenarios.
Contribution
It provides a comprehensive evaluation of classical and deep learning models for fare prediction under noisy conditions, including pre-processing strategies and robustness analysis.
Findings
GAT and TimesNet outperform XGBoost in noisy data scenarios.
Denoising improves model accuracy and robustness.
Deep learning models show better uncertainty estimation and OOD robustness.
Abstract
Precise fare prediction is crucial in ride-hailing platforms and urban mobility systems. This study examines three machine learning models-Graph Attention Networks (GAT), XGBoost, and TimesNet to evaluate their predictive capabilities for taxi fares using a real-world dataset comprising over 55 million records. Both raw (noisy) and denoised versions of the dataset are analyzed to assess the impact of data quality on model performance. The study evaluated the models along multiple axes, including predictive accuracy, calibration, uncertainty estimation, out-of-distribution (OOD) robustness, and feature sensitivity. We also explore pre-processing strategies, including KNN imputation, Gaussian noise injection, and autoencoder-based denoising. The study reveals critical differences between classical and deep learning models under realistic conditions, offering practical guidelines for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation and Mobility Innovations · Traffic Prediction and Management Techniques · Human Mobility and Location-Based Analysis
