Scalable Fault-Tolerant Data Feeds in AsterixDB

Raman Grover; Michael J. Carey

arXiv:1405.1705·cs.DB·May 8, 2014·1 cites

Scalable Fault-Tolerant Data Feeds in AsterixDB

Raman Grover, Michael J. Carey

PDF

Open Access

TL;DR

This paper presents AsterixDB's integrated support for scalable, fault-tolerant data feeds, enabling continuous ingestion and indexing of high-velocity semi-structured data within a unified system.

Contribution

It introduces native data feed support in AsterixDB, detailing its design, implementation, and customization options for scalable, fault-tolerant data ingestion.

Findings

01

Demonstrates scalability of data feeds in AsterixDB

02

Shows fault-tolerance capabilities through initial experiments

03

Highlights customization options for resource allocation

Abstract

In this paper we describe the support for data feed ingestion in AsterixDB, an open-source Big Data Management System (BDMS) that provides a platform for storage and analysis of large volumes of semi-structured data. Data feeds are a mechanism for having continuous data arrive into a BDMS from external sources and incrementally populate a persisted dataset and associated indexes. The need to persist and index "fast-flowing" high-velocity data (and support ad hoc analytical queries) is ubiquitous. However, the state of the art today involves 'gluing' together different systems. AsterixDB is different in being a unified system with "native support" for data feed ingestion. We discuss the challenges and present the design and implementation of the concepts involved in modeling and managing data feeds in AsterixDB. AsterixDB allows the runtime behavior, allocation of resources and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Distributed systems and fault tolerance · Advanced Data Storage Technologies