Cohort Query Processing
Dawei Jiang, Qingchao Cai, Gang Chen, H. V. Jagadish, Beng Chin Ooi,, Kian-Lee Tan, Anthony K. H. Tung

TL;DR
This paper introduces new SQL operators and evaluation schemes to efficiently support cohort analysis in large-scale database systems, demonstrating significant performance improvements with a columnar approach.
Contribution
It extends SQL with three new operators and proposes three evaluation schemes, including a columnar optimized method for cohort query processing.
Findings
Columnar approach outperforms non-intrusive methods
Proposed operators simplify cohort query specification
Experimental results show performance benefits
Abstract
Modern Internet applications often produce a large volume of user activity records. Data analysts are interested in cohort analysis, or finding unusual user behavioral trends, in these large tables of activity records. In a traditional database system, cohort analysis queries are both painful to specify and expensive to evaluate. We propose to extend database systems to support cohort analysis. We do so by extending SQL with three new operators. We devise three different evaluation schemes for cohort query processing. Two of them adopt a non-intrusive approach. The third approach employs a columnar based evaluation scheme with optimizations specifically designed for cohort query processing. Our experimental results confirm the performance benefits of our proposed columnar database system, compared against the two non-intrusive approaches that implement cohort queries on top of regular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Time Series Analysis and Forecasting
