Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
Edmon Begoli, Jes\'us Camacho Rodr\'iguez, Julian Hyde, Michael J., Mior, Daniel Lemire

TL;DR
Apache Calcite is a versatile, extensible framework that provides query processing and optimization capabilities for various data sources and systems, supporting multiple query languages and data models.
Contribution
It introduces a modular, extensible architecture with a rich set of optimization rules and adapter support for heterogeneous data sources, enhancing query processing in big-data frameworks.
Findings
Supports multiple data models including relational, semi-structured, streaming, and geospatial.
Provides a flexible architecture adopted by many open-source data systems.
Continuously evolving to include new data sources and query processing techniques.
Abstract
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Cloud Computing and Resource Management
