Building an OceanBase-based Distributed Nearly Real-time Analytical Processing Database System
Quanqing Xu, Chuanhui Yang, Ruijie Li, Dongdong Xie, Hui Cao, Yi Xiao, Junquan Chen, Yanzuo Wang, Saitong Zhao, Fusheng Han, Bin Liu, Guoping Wang, Yuzhong Zhao, Mingqiang Zhuang

TL;DR
This paper presents OceanBase Mercury, a distributed OLAP system designed for petabyte-scale data that achieves real-time analytics with high performance, scalability, and availability, addressing limitations of traditional OLAP and real-time systems.
Contribution
It introduces a novel OLAP system with adaptive storage, differential refresh, and polymorphic vectorization, enabling efficient real-time analytics at large scale.
Findings
Outperforms specialized OLAP engines by 1.3X to 3.1X in query speed
Maintains sub-second latency under real-world workloads
Supports petabyte-scale data with high availability and elasticity
Abstract
The growing demand for database systems capable of efficiently managing massive datasets while delivering real-time transaction processing and advanced analytical capabilities has become critical in modern data infrastructure. While traditional OLAP systems often fail to meet these dual requirements, emerging real-time analytical processing systems still face persistent challenges, such as excessive data redundancy, complex cross-system synchronization, and suboptimal temporal efficiency. This paper introduces OceanBase Mercury as an innovative OLAP system designed for petabyte-scale data. The system features a distributed, multi-tenant architecture that ensures essential enterprise-grade requirements, including continuous availability and elastic scalability. Our technical contributions include three key components: (1) an adaptive columnar storage format with hybrid data layout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Distributed systems and fault tolerance
