LMFAO: An Engine for Batches of Group-By Aggregates

Maximilian Schleich; Dan Olteanu

arXiv:2008.08657·cs.DB·August 21, 2020

LMFAO: An Engine for Batches of Group-By Aggregates

Maximilian Schleich, Dan Olteanu

PDF

TL;DR

LMFAO is an in-memory engine optimized for efficiently executing large batches of group-by aggregate queries over joins, significantly accelerating data science workloads such as regression, decision trees, and clustering.

Contribution

The paper introduces LMFAO, a novel in-memory engine specifically designed for batch group-by aggregates over joins, improving performance for data science tasks.

Findings

01

Demonstrates LMFAO's effectiveness on ridge linear regression, decision trees, and clustering.

02

Shows significant speedups over traditional database engines.

03

Validates LMFAO's applicability to real-world data science models.

Abstract

LMFAO is an in-memory optimization and execution engine for large batches of group-by aggregates over joins. Such database workloads capture the data-intensive computation of a variety of data science applications. We demonstrate LMFAO for three popular models: ridge linear regression with batch gradient descent, decision trees with CART, and clustering with Rk-means.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.