# Provably Optimal Parallel Transport Sweeps on Semi-Structured Grids

**Authors:** Michael P. Adams, Marvin L. Adams, W. Daryl Hawkins, Timmie Smith,, Lawrence Rauchwerger, Nancy M. Amato, Teresa S. Bailey, Robert D. Falgout,, Adam Kunen, Peter Brown

arXiv: 1906.02950 · 2020-03-18

## TL;DR

This paper introduces provably optimal algorithms for parallel discrete-ordinate transport sweeps on semi-structured grids, achieving minimal stages and high parallel efficiency on large-scale supercomputers.

## Contribution

The authors develop and validate algorithms that guarantee minimal-stage execution of transport sweeps on semi-structured grids, enabling highly efficient parallel performance.

## Key findings

- Achieved approximately 68% parallel efficiency with over 1.5 million threads.
- Demonstrated minimal-stage sweep execution on complex nuclear-reactor geometries.
- Validated performance model accuracy with observed efficiencies.

## Abstract

We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on a class of grids in 2D and 3D Cartesian geometry that are regular at a coarse level but arbitrary within the coarse blocks. We describe these algorithms and show that they always execute the full eight-octant (or four-quadrant if 2D) sweep in the minimum possible number of stages for a given Px x Py x Pz partitioning. Computational results confirm that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. Our PDT transport code has achieved approximately 68% parallel efficiency with > 1.5M parallel threads, relative to 8 threads, on a simple weak-scaling problem with only three energy groups, 10 direction per octant, and 4096 cells/core. We demonstrate similar efficiencies on a much more realistic set of nuclear-reactor test problems, with unstructured meshes that resolve fine geometric details. These results demonstrate that discrete-ordinates transport sweeps can be executed with high efficiency using more than 106 parallel processes.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.02950/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/1906.02950/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1906.02950/full.md

---
Source: https://tomesphere.com/paper/1906.02950