One Size Fits None: Modeling NYC Taxi Trips

Tomas Eglinskas

arXiv:2602.19404·cs.LG·February 24, 2026

One Size Fits None: Modeling NYC Taxi Trips

Tomas Eglinskas

PDF

Open Access

TL;DR

This study compares the predictability of tips in traditional taxis and app-based ride-sharing in NYC, revealing that traditional tips are highly predictable while app-based tips are largely random, emphasizing the need for specialized models.

Contribution

The paper demonstrates that a universal tipping model is ineffective, highlighting the importance of category-specific models due to differing data patterns and the impact of Simpson's paradox.

Findings

01

Traditional taxi tips are highly predictable with $R^2 \\approx 0.72$

02

App-based tips are difficult to predict with $R^2 \\approx 0.17$

03

A combined model fails to accurately predict tips for individual categories

Abstract

The rise of app-based ride-sharing has fundamentally changed tipping culture in New York City. We analyzed 280 million trips from 2024 to see if we could predict tips for traditional taxis versus high-volume for-hire services. By testing methods from linear regression to deep neural networks, we found two very different outcomes. Traditional taxis are highly predictable ( $R^{2} \approx 0.72$ ) due to the in-car payment screen. In contrast, app-based tipping is random and hard to model ( $R^{2} \approx 0.17$ ). In conclusion, we show that building one universal model is a mistake and, due to Simpson's paradox, a combined model looks accurate on average but fails to predict tips for individual taxi categories requiring specialized models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychology of Social Influence · Transportation and Mobility Innovations · Human Mobility and Location-Based Analysis