D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database
Jeremy Kepner, Christian Anderson, William Arcand, David Bestor, Bill, Bergeron, Chansup Byun, Matthew Hubbell, Peter Michaleas, Julie Mullen, David, O'Gwynn, Andrew Prout, Albert Reuther, Antonio Rosa, Charles Yee (MIT)

TL;DR
This paper introduces the D4M 2.0 Schema, a high-performance, general-purpose schema for the Accumulo database that enables rapid querying and indexing of diverse datasets with minimal customization.
Contribution
The paper presents a novel, simple schema based on associative arrays that significantly improves Accumulo's ingest rates and applicability across various data types without extensive customization.
Findings
Achieves the highest published Accumulo ingest rates
Applicable to diverse data types with minimal parsing
Independent of the D4M interface
Abstract
Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed consistency database that is widely used for government applications. Obtaining the full benefits of Accumulo requires using novel schemas. The Dynamic Distributed Dimensional Data Model (D4M)[http://d4m.mit.edu] provides a uniform mathematical framework based on associative arrays that encompasses both traditional (i.e., SQL) and non-traditional databases. For non-traditional databases D4M naturally leads to a general purpose schema that can be used to fully index and rapidly query every unique string in a dataset. The D4M 2.0 Schema has been applied with little or no customization to cyber, bioinformatics, scientific citation, free text, and social…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
