Near Data Processing in Taurus Database
Shu Lin, Arunprasad P. Marathe, Per-\.Ake Larson, Chong Chen, Calvin Sun, Paul Lee, Weidong Yu

TL;DR
This paper presents the design and implementation of near data processing in Huawei's Taurus database, significantly reducing data transfer and query execution time by pushing operations close to storage servers.
Contribution
It introduces a novel NDP approach in Taurus, demonstrating substantial improvements in data transfer reduction and query performance through experimental evaluation.
Findings
Data shipped reduced by up to 98% on Q15
CPU time decreased by up to 91% on Q15
Most queries (18/22) benefited from NDP
Abstract
Huawei's cloud-native database system GaussDB for MySQL (also known as Taurus) stores data in a separate storage layer consisting of a pool of storage servers. Each server has considerable compute power making it possible to push data reduction operations (selection, projection, and aggregation) close to storage. This paper describes the design and implementation of near data processing (NDP) in Taurus. NDP has several benefits: it reduces the amount of data shipped over the network; frees up CPU capacity in the compute layer; and reduces query run time, thereby enabling higher system throughput. Experiments with the TPCH benchmark (100 GB) showed that 18 out of 22 queries benefited from NDP; data shipped was reduced by 63 percent; and CPU time by 50 percent. On Q15 the impact was even higher: data shipped was reduced by 98 percent; CPU time by 91 percent; and run time by 80 percent.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
