Lustre, Hadoop, Accumulo
Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Chansup, Byun, Lauren Edwards, Vijay Gadepally, Matthew Hubbell, Peter Michaleas,, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther

TL;DR
This paper compares Lustre, Hadoop, and Accumulo, highlighting their foundational principles, capabilities, and performance differences on a hypothetical cluster, aiding in understanding their optimal use cases.
Contribution
It provides a foundational comparison and simple models for assessing Lustre, Hadoop, and Accumulo, including performance metrics and recent integration efforts.
Findings
Lustre offers 2x more storage capacity and higher bandwidth on general workloads.
Hadoop provides 4x greater read bandwidth on specific workloads.
Accumulo achieves 10,000x lower latency on random lookups.
Abstract
Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the largest and the most challenging data storage problems. There have been many ad-hoc comparisons of these technologies. This paper describes the foundational principles of each technology, provides simple models for assessing their capabilities, and compares the various technologies on a hypothetical common cluster. These comparisons indicate that Lustre provides 2x more storage capacity, is less likely to loose data during 3 simultaneous drive failures, and provides higher bandwidth on general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
