Influence of atomic FAA on ParallelFor and a cost model for improvements
Ran Shuai

TL;DR
This paper investigates how atomic FAA operations impact ParallelFor's latency and proposes a cost model to optimize performance by reducing FAA influence across different platforms.
Contribution
It introduces a detailed analysis of FAA's effect on ParallelFor and presents a cost model for performance improvements.
Findings
FAA significantly affects ParallelFor latency
Performance varies across different hardware platforms
A cost model can guide optimization strategies
Abstract
This paper focuses on one of the most frequently visited multithreading library interfaces - ParallelFor. In this study, it is inferred that ParallelFor's end-to-end latency performance is noticeably affected by the frequency with which fetch-add-add (FAA) is called during program execution. This can be explained by ParallelFor's uniform semantics and the utilization of atomic FAA. To prove this assumption, a battery of tests was designed and conducted on diverse platforms. From the collected performance statistics and overall trends, several conclusions were drawn and a cost model is proposed to enhance performance by mitigating the influence of FAA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management
