# Solving All-Pairs Shortest-Paths Problem in Large Graphs Using Apache   Spark

**Authors:** Frank Schoeneman, Jaroslaw Zola

arXiv: 1902.04446 · 2019-08-08

## TL;DR

This paper explores the implementation of All-Pairs Shortest-Paths algorithms on large graphs using Apache Spark, demonstrating scalability and performance comparisons with MPI-based solutions on distributed clusters.

## Contribution

It introduces four Spark-based APSP algorithms for large graphs, analyzing their performance and practicality on distributed memory clusters.

## Key findings

- Spark can handle graphs with over 200,000 vertices on 1024 cores.
- Spark-based solutions are competitive with MPI-based methods.
- The best Spark implementation requires auxiliary storage and is slower than optimized MPI solutions.

## Abstract

Algorithms for computing All-Pairs Shortest-Paths (APSP) are critical building blocks underlying many practical applications. The standard sequential algorithms, such as Floyd-Warshall and Johnson, quickly become infeasible for large input graphs, necessitating parallel approaches. In this work, we provide detailed analysis of parallel APSP performance on distributed memory clusters with Apache Spark. The Spark model allows for a portable and easy to deploy distributed implementation, and hence is attractive from the end-user point of view. We propose four different APSP implementations for large undirected weighted graphs, which differ in complexity and degree of reliance on techniques outside of pure Spark API. We demonstrate that Spark is able to handle APSP problems with over 200,000 vertices on a 1024-core cluster, and can compete with a naive MPI-based solution. However, our best performing solver requires auxiliary shared persistent storage, and is over two times slower than optimized MPI-based solver.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.04446/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1902.04446/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1902.04446/full.md

---
Source: https://tomesphere.com/paper/1902.04446