Better than $1/Mflops sustained: a scalable PC-based parallel computer for lattice QCD
Z. Fodor, S.D. Katz, G. Papp

TL;DR
This paper presents a scalable, cost-effective PC-based parallel computer architecture optimized for lattice QCD simulations, achieving high sustained performance with innovative Ethernet-based communication.
Contribution
It introduces a novel Ethernet-based communication architecture enabling scalable, high-performance lattice QCD simulations on a PC cluster at a cost below $1/Mflops.
Findings
Achieves 1510 Mflops/node for Wilson fermions
Total performance of 208 Gflops for Wilson QCD
Communication overhead is around 40% for large lattices
Abstract
We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The E\"otv\"os Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
