Evaluating Accumulo Performance for a Scalable Cyber Data Processing Pipeline
Scott M. Sawyer, B. David O'Gwynn

TL;DR
This paper evaluates the scalability and performance of Apache Accumulo in a cyber data processing pipeline, focusing on data ingestion, query responsiveness, and system scalability with real cyber data.
Contribution
It provides an empirical assessment of Accumulo's ingestion and query performance in a scalable cyber data warehousing context, with techniques for effective query planning.
Findings
Accumulo scales well with increased client processes and servers.
Query latency remains responsive with up to 8 nodes using real cyber data.
Effective data modeling and batching improve query response times.
Abstract
Streaming, big data applications face challenges in creating scalable data flow pipelines, in which multiple data streams must be collected, stored, queried, and analyzed. These data sources are characterized by their volume (in terms of dataset size), velocity (in terms of data rates), and variety (in terms of fields and types). For many applications, distributed NoSQL databases are effective alternatives to traditional relational database management systems. This paper considers a cyber situational awareness system that uses the Apache Accumulo database to provide scalable data warehousing, real-time data ingest, and responsive querying for human users and analytic algorithms. We evaluate Accumulo's ingestion scalability as a function of number of client processes and servers. We also describe a flexible data model with effective techniques for query planning and query batching to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Cloud Computing and Resource Management
