# An IDEA: An Ingestion Framework for Data Enrichment in AsterixDB

**Authors:** Xikui Wang, Michael J. Carey

arXiv: 1902.08271 · 2020-08-18

## TL;DR

This paper introduces an ingestion framework for AsterixDB that enables scalable data enrichment with complex operations and adapts to changing reference data, improving query support for big data analytics.

## Contribution

The paper presents a novel ingestion framework integrated with AsterixDB that handles large-scale data, complex enrichments, and dynamic reference data updates.

## Key findings

- Framework supports high-throughput data ingestion and enrichment.
- Enables complex operations within the ingestion pipeline.
- Demonstrates scalability and adaptability through performance evaluation.

## Abstract

Big Data today is being generated at an unprecedented rate from various sources such as sensors, applications, and devices, and it often needs to be enriched based on other reference information to support complex analytical queries. Depending on the use case, the enrichment operations can be compiled code, declarative queries, or machine learning models with different complexities. For enrichments that will be frequently used in the future, it can be advantageous to push their computation into the ingestion pipeline so that they can be stored (and queried) together with the data. In some cases, the referenced information may change over time, so the ingestion pipeline should be able to adapt to such changes to guarantee the currency and/or correctness of the enrichment results.   In this paper, we present a new data ingestion framework that supports data ingestion at scale, enrichments requiring complex operations, and adaptiveness to reference data changes. We explain how this framework has been built on top of Apache AsterixDB and investigate its performance at scale under various workloads.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.08271/full.md

## Figures

42 figures with captions in the complete paper: https://tomesphere.com/paper/1902.08271/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/1902.08271/full.md

---
Source: https://tomesphere.com/paper/1902.08271