Serving Hybrid-Cloud SQL Interactive Queries at Twitter
Chunxu Tang, Beinan Wang, Huijun Wu, Zhenzhao Wang, Yao Li, Vrushali, Channapattan, Zhenxiao Luo, Ruchin Kabra, Mainak Ghosh, Nikhil Kantibhai, Navadiya, Prachi Mishra, Prateek Mukhedkar, Anneliese Lu

TL;DR
This paper describes Twitter's development of a hybrid-cloud SQL federation system that enables scalable, high-availability data queries across data centers and public cloud, handling around 10PB daily.
Contribution
It introduces the design and implementation of a hybrid-cloud SQL federation system tailored for Twitter's large-scale data needs, addressing modern system challenges.
Findings
System processes queries across data centers and cloud
Addresses challenges with key design decisions
Provides lessons learned from deployment and operation
Abstract
The demand for data analytics has been consistently increasing in the past years at Twitter. In order to fulfill the requirements and provide a highly scalable and available query experience, a large-scale in-house SQL system is heavily relied on. Recently, we evolved the SQL system into a hybrid-cloud SQL federation system, compliant with Twitter's Partly Cloudy strategy. The hybrid-cloud SQL federation system is capable of processing queries across Twitter's data centers and the public cloud, interacting with around 10PB of data per day. In this paper, the design of the hybrid-cloud SQL federation system is presented, which consists of query, cluster, and storage federations. We identify challenges in a modern SQL system and demonstrate how our system addresses them with some important design decisions. We also conduct qualitative examinations and summarize instructive lessons…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Peer-to-Peer Network Technologies · Web Data Mining and Analysis
