Performance Evaluation of ParalleX Execution model on Arm-based Platforms
Nikunj Gupta, Rohit Ashiwal, Bine Brank, Sateesh K. Peddoju, Dirk, Pleiter

TL;DR
This paper evaluates the performance of the ParalleX execution model, specifically HPX, on Arm-based platforms, comparing it with x86 processors through benchmarks emphasizing vectorization and scaling.
Contribution
It demonstrates the porting and performance evaluation of HPX on Arm processors, highlighting comparable or superior results to x86 systems.
Findings
Arm-based HPX performance is comparable or better than x86.
Benchmarks show good vectorization and scaling on Arm.
Discussion of current limitations in the Arm ecosystem.
Abstract
The HPC community shows a keen interest in creating diversity in the CPU ecosystem. The advent of Arm-based processors provides an alternative to the existing HPC ecosystem, which is primarily dominated by x86 processors. In this paper, we port an Asynchronous Many-Task runtime system based on the ParalleX model, i.e., High Performance ParalleX (HPX), and evaluate it on the Arm ecosystem with a suite of benchmarks. We wrote these benchmarks with an emphasis on vectorization and distributed scaling. We present the performance results on a variety of Arm processors and compare it with their x86 brethren from Intel. We show that the results obtained are equally good or better than their x86 brethren. Finally, we also discuss a few drawbacks of the present Arm ecosystem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
