A Scalable and Robust Framework for Data Stream Ingestion

Haruna Isah; Farhana Zulkernine

arXiv:1812.04197·cs.DC·July 23, 2019

A Scalable and Robust Framework for Data Stream Ingestion

Haruna Isah, Farhana Zulkernine

PDF

TL;DR

This paper presents a scalable, fault-tolerant framework for ingesting and integrating high-velocity data streams from diverse sources, demonstrated through a real-world case study involving Apache NiFi and Kafka.

Contribution

It introduces a novel, reusable data stream ingestion framework that enhances scalability and robustness, addressing key challenges in processing continuous, high-volume data streams.

Findings

01

Framework effectively handles diverse data sources

02

Demonstrated robustness in real-world case study

03

Identified best practices and future research gaps

Abstract

An essential part of building a data-driven organization is the ability to handle and process continuous streams of data to discover actionable insights. The explosive growth of interconnected devices and the social Web has led to a large volume of data being generated on a continuous basis. Streaming data sources such as stock quotes, credit card transactions, trending news, traffic conditions, time-sensitive patients data are not only very common but can rapidly depreciate if not processed quickly. The ever-increasing volume and highly irregular nature of data rates pose new challenges to data stream processing systems. One such challenging but important task is how to accurately ingest and integrate data streams from various sources and locations into an analytics platform. These challenges demand new strategies and systems that can offer the desired degree of scalability and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.