Optimization of Lattice QCD codes for the AMD Opteron processor
Miho Koma (DESY / RCNP Osaka Univ.)

TL;DR
This paper discusses the optimization techniques applied to lattice QCD codes for the AMD Opteron processor, focusing on SSE/SSE2 instructions and prefetching to improve performance on a new cluster.
Contribution
It presents specific optimization strategies for lattice QCD codes on AMD Opteron processors, including implementation details and benchmark results.
Findings
Significant performance improvements achieved through SSE/SSE2 optimization.
Effective use of prefetch instructions enhances code efficiency.
Benchmark results demonstrate the benefits of the optimization techniques.
Abstract
We report our experience of the optimization of the lattice QCD codes for the new Opteron cluster at DESY Hamburg, including benchmarks. Details of the optimization using SSE/SSE2 instructions and the effective use of prefetch instructions are discussed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
