Experiences with OpenMP in tmLQCD
A. Deuzeman, K. Jansen, B. Kostrzewa, C. Urbach

TL;DR
This paper discusses lessons learned from implementing OpenMP multi-threading in tmLQCD, focusing on performance, scalability, and programming challenges encountered during development.
Contribution
It provides a detailed analysis of OpenMP integration in tmLQCD, highlighting practical insights on performance optimization and common issues faced.
Findings
Performance varies with different implementations of the hopping matrix
Identified key bottlenecks such as cache misses and overheads
Effective thread distribution improves scalability
Abstract
An overview is given of the lessons learned from the introduction of multi-threading using OpenMP in tmLQCD. In particular, programming style, performance measurements, cache misses, scaling, thread distribution for hybrid codes, race conditions, the overlapping of communication and computation and the measurement and reduction of certain overheads are discussed. Performance measurements and sampling profiles are given for different implementations of the hopping matrix computational kernel.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Algorithms and Data Compression · Computational Physics and Python Applications
