Exact diagonalization of quantum lattice models on coprocessors
Topi Siro, Ari Harju

TL;DR
This paper compares the performance of the Lanczos algorithm on different hardware platforms, finding that GPUs excel for large quantum lattice models, while multi-core CPUs are better for smaller systems.
Contribution
It demonstrates the implementation and performance comparison of the Lanczos algorithm on Intel Xeon Phi, CPU, and GPU for quantum lattice models.
Findings
GPUs outperform CPUs for large systems with up to 7.6x speedup.
Xeon Phi outperforms CPU with up to 2.5x speedup for large particle numbers.
For small systems, multi-core CPU is the fastest platform.
Abstract
We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
