Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Simon Heybrock; B\'alint Jo\'o; Dhiraj D. Kalamkar; Mikhail; Smelyanskiy; Karthikeyan Vaidyanathan; Tilo Wettig; Pradeep Dubey

arXiv:1412.2629·hep-lat·December 9, 2014·SC

Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Simon Heybrock, B\'alint Jo\'o, Dhiraj D. Kalamkar, Mikhail, Smelyanskiy, Karthikeyan Vaidyanathan, Tilo Wettig, Pradeep Dubey

PDF

TL;DR

This paper presents a domain decomposition solver for Lattice QCD optimized for Intel Xeon Phi co-processors, significantly reducing data movement and achieving high scalability and performance.

Contribution

It introduces a novel domain decomposition algorithm tailored for Xeon Phi architectures, improving scalability and efficiency over standard solvers.

Findings

01

Achieves close-to-linear scaling on 60 cores of KNC

02

Sustains 400-500 Gflop/s per chip with mixed precision

03

Reduces time-to-solution by a factor of 5 compared to standard solvers

Abstract

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.