Real-time Data Infrastructure at Uber
Yupeng Fu, Chinmay Soman

TL;DR
Uber's real-time data infrastructure handles massive, continuous data streams for diverse use cases, relying heavily on open-source tech with custom enhancements to meet its unique scale and operational needs.
Contribution
This paper details Uber's scalable real-time data architecture, highlighting open-source integrations, custom improvements, and lessons learned from deployment at large scale.
Findings
Effective open-source integration for real-time data processing
Custom enhancements improve scalability and reliability
Insights into operational challenges and solutions
Abstract
Uber's business is highly real-time in nature. PBs of data is continuously being collected from the end users such as Uber drivers, riders, restaurants, eaters and so on everyday. There is a lot of valuable information to be processed and many decisions must be made in seconds for a variety of use cases such as customer incentives, fraud detection, machine learning model prediction. In addition, there is an increasing need to expose this ability to different user categories, including engineers, data scientists, executives and operations personnel which adds to the complexity. In this paper, we present the overall architecture of the real-time data infrastructure and identify three scaling challenges that we need to continuously address for each component in the architecture. At Uber, we heavily rely on open source technologies for the key areas of the infrastructure. On top of those…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
