Real-time Data Infrastructure at Uber

Yupeng Fu; Chinmay Soman

arXiv:2104.00087·cs.DC·April 2, 2021

Real-time Data Infrastructure at Uber

Yupeng Fu, Chinmay Soman

PDF

TL;DR

Uber's real-time data infrastructure handles massive, continuous data streams for diverse use cases, relying heavily on open-source tech with custom enhancements to meet its unique scale and operational needs.

Contribution

This paper details Uber's scalable real-time data architecture, highlighting open-source integrations, custom improvements, and lessons learned from deployment at large scale.

Findings

01

Effective open-source integration for real-time data processing

02

Custom enhancements improve scalability and reliability

03

Insights into operational challenges and solutions

Abstract

Uber's business is highly real-time in nature. PBs of data is continuously being collected from the end users such as Uber drivers, riders, restaurants, eaters and so on everyday. There is a lot of valuable information to be processed and many decisions must be made in seconds for a variety of use cases such as customer incentives, fraud detection, machine learning model prediction. In addition, there is an increasing need to expose this ability to different user categories, including engineers, data scientists, executives and operations personnel which adds to the complexity. In this paper, we present the overall architecture of the real-time data infrastructure and identify three scaling challenges that we need to continuously address for each component in the architecture. At Uber, we heavily rely on open source technologies for the key areas of the infrastructure. On top of those…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.