# A Layered Aggregate Engine for Analytics Workloads

**Authors:** Maximilian Schleich, Dan Olteanu, Mahmoud Abo Khamis, Hung Q., Ngo, XuanLong Nguyen

arXiv: 1906.08687 · 2019-06-21

## TL;DR

LMFAO is an in-memory engine that optimizes and executes aggregate computations for analytics workloads, significantly outperforming traditional database systems and machine learning frameworks.

## Contribution

This work introduces LMFAO, a layered optimization engine that efficiently handles aggregate-based analytics tasks, combining logical and code optimizations for high performance.

## Key findings

- LMFAO outperforms commercial databases and MonetDB by several orders of magnitude.
- LMFAO surpasses TensorFlow, Scikit, R, and AC/DC in learning models over databases.
- The engine is versatile for various analytics like regression, classification, Bayesian networks, and data cubes.

## Abstract

This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their data-intensive tasks can be decomposed into group-by aggregates over the join of the input database relations. We exemplify the versatility and competitiveness of LMFAO for a handful of widely used analytics: learning ridge linear regression, classification trees, regression trees, and the structure of Bayesian networks using Chow-Liu trees; and data cubes used for exploration in data warehousing.   LMFAO consists of several layers of logical and code optimizations that systematically exploit sharing of computation, parallelism, and code specialization.   We conducted two types of performance benchmarks. In experiments with four datasets, LMFAO outperforms by several orders of magnitude on one hand, a commercial database system and MonetDB for computing batches of aggregates, and on the other hand, TensorFlow, Scikit, R, and AC/DC for learning a variety of models over databases.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.08687/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1906.08687/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/1906.08687/full.md

---
Source: https://tomesphere.com/paper/1906.08687