Exact diagonalization of quantum lattice models on coprocessors

Topi Siro; Ari Harju

arXiv:1511.00863·cond-mat.str-el·September 21, 2016

Exact diagonalization of quantum lattice models on coprocessors

Topi Siro, Ari Harju

PDF

TL;DR

This paper compares the performance of the Lanczos algorithm on different hardware platforms, finding that GPUs excel for large quantum lattice models, while multi-core CPUs are better for smaller systems.

Contribution

It demonstrates the implementation and performance comparison of the Lanczos algorithm on Intel Xeon Phi, CPU, and GPU for quantum lattice models.

Findings

01

GPUs outperform CPUs for large systems with up to 7.6x speedup.

02

Xeon Phi outperforms CPU with up to 2.5x speedup for large particle numbers.

03

For small systems, multi-core CPU is the fastest platform.

Abstract

We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.