PageRank Pipeline Benchmark: Proposal for a Holistic System Benchmark for Big-Data Platforms
Patrick Dreher, Chansup Byun, Chris Hill, Vijay Gadepally, Bradley, Kuszmaul, Jeremy Kepner

TL;DR
This paper proposes a comprehensive, scalable PageRank pipeline benchmark for big data systems, inspired by supercomputing methodologies, to enable rigorous performance evaluation across diverse hardware and software platforms.
Contribution
It introduces a holistic, multi-kernel benchmark based on PageRank, integrating existing benchmarks and adaptable to various programming environments and system scales.
Findings
Implemented in multiple languages with measured single-threaded performance.
Demonstrates scalability in both problem size and hardware.
Provides a rigorous, mathematically defined benchmarking framework.
Abstract
The rise of big data systems has created a need for benchmarks to measure and compare the capabilities of these systems. Big data benchmarks present unique scalability challenges. The supercomputing community has wrestled with these challenges for decades and developed methodologies for creating rigorous scalable benchmarks (e.g., HPC Challenge). The proposed PageRank pipeline benchmark employs supercomputing benchmarking methodologies to create a scalable benchmark that is reflective of many real-world big data processing systems. The PageRank pipeline benchmark builds on existing prior scalable benchmarks (Graph500, Sort, and PageRank) to create a holistic benchmark with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. The linear algebraic nature of PageRank makes it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
