Influence of atomic FAA on ParallelFor and a cost model for improvements

Ran Shuai

arXiv:2111.13291·cs.DC·November 29, 2021

Influence of atomic FAA on ParallelFor and a cost model for improvements

Ran Shuai

PDF

Open Access

TL;DR

This paper investigates how atomic FAA operations impact ParallelFor's latency and proposes a cost model to optimize performance by reducing FAA influence across different platforms.

Contribution

It introduces a detailed analysis of FAA's effect on ParallelFor and presents a cost model for performance improvements.

Findings

01

FAA significantly affects ParallelFor latency

02

Performance varies across different hardware platforms

03

A cost model can guide optimization strategies

Abstract

This paper focuses on one of the most frequently visited multithreading library interfaces - ParallelFor. In this study, it is inferred that ParallelFor's end-to-end latency performance is noticeably affected by the frequency with which fetch-add-add (FAA) is called during program execution. This can be explained by ParallelFor's uniform semantics and the utilization of atomic FAA. To prove this assumption, a battery of tests was designed and conducted on diverse platforms. From the collected performance statistics and overall trends, several conclusions were drawn and a cost model is proposed to enhance performance by mitigating the influence of FAA.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management