Crossing the Architectural Barrier: Evaluating Representative Regions of Parallel HPC Applications
Alexandra Ferreron, Radhika Jagtap, Sascha Bischoff, Roxana Rusitoru

TL;DR
This paper evaluates the BarrierPoint methodology for selecting representative regions of parallel HPC applications across Intel and ARM architectures, demonstrating significant simulation time reduction with minimal error.
Contribution
It provides an independent cross-architectural evaluation of BarrierPoint on real hardware, identifying when it can be applied and proposing improvements for limitations.
Findings
Simulation time reduced by up to 178x
Prediction error kept below 2.3% for cycles and instructions
Effective across multiple architectures with some limitations
Abstract
Exascale computing will get mankind closer to solving important social, scientific and engineering problems. Due to high prototyping costs, High Performance Computing (HPC) system architects make use of simulation models for design space exploration and hardware-software co-design. However, as HPC systems reach exascale proportions, the cost of simulation increases, since simulators themselves are largely single-threaded. Tools for selecting representative parts of parallel applications to reduce running costs are widespread, e.g., BarrierPoint achieves this by analysing, in simulation, abstract characteristics such as basic blocks and reuse distances. However, architectures new to HPC have a limited set of tools available. In this work, we provide an independent cross-architectural evaluation on real hardware - across Intel and ARM - of the BarrierPoint methodology, when applied to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
