# LaraDB: A Minimalist Kernel for Linear and Relational Algebra   Computation

**Authors:** Dylan Hutchison, Bill Howe, Dan Suciu

arXiv: 1703.07342 · 2017-05-16

## TL;DR

LaraDB introduces a minimalist algebraic framework unifying relational and linear algebra, enabling efficient implementation and optimization of analytics tasks on backend engines like Apache Accumulo.

## Contribution

The paper presents Lara, a simple algebra with three operators that unifies RA and LA, and demonstrates its practical efficiency in analytics tasks through implementation in LaraDB.

## Key findings

- Lara operators can be efficiently implemented using range scans over partitioned sorted maps.
- LaraDB outperforms native MapReduce in join and matrix multiply tasks at small scales.
- Lara provides a flexible framework for optimizing mixed-abstraction analytics without sacrificing record-level updates.

## Abstract

Analytics tasks manipulate structured data with variants of relational algebra (RA) and quantitative data with variants of linear algebra (LA). The two computational models have overlapping expressiveness, motivating a common programming model that affords unified reasoning and algorithm design. At the logical level we propose Lara, a lean algebra of three operators, that expresses RA and LA as well as relevant optimization rules. We show a series of proofs that position Lara %formal and informal at just the right level of expressiveness for a middleware algebra: more explicit than MapReduce but more general than RA or LA. At the physical level we find that the Lara operators afford efficient implementations using a single primitive that is available in a variety of backend engines: range scans over partitioned sorted maps.   To evaluate these ideas, we implemented the Lara operators as range iterators in Apache Accumulo, a popular implementation of Google's BigTable. First we show how Lara expresses a sensor quality control task, and we measure the performance impact of optimizations Lara admits on this task. Second we show that the LaraDB implementation outperforms Accumulo's native MapReduce integration on a core task involving join and aggregation in the form of matrix multiply, especially at smaller scales that are typically a poor fit for scale-out approaches. We find that LaraDB offers a conceptually lean framework for optimizing mixed-abstraction analytics tasks, without giving up fast record-level updates and scans.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.07342/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/1703.07342/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1703.07342/full.md

---
Source: https://tomesphere.com/paper/1703.07342