Matrix-free algorithms for fast ab initio calculations on distributed CPU architectures using finite-element discretization

Gourab Panigrahi; Phani Motamarri

arXiv:2512.08571·physics.comp-ph·December 11, 2025

Matrix-free algorithms for fast ab initio calculations on distributed CPU architectures using finite-element discretization

Gourab Panigrahi, Phani Motamarri

PDF

Open Access

TL;DR

This paper introduces matrix-free algorithms for finite-element discretized DFT calculations that significantly improve computational efficiency and scalability on distributed CPU architectures, enabling faster large-scale quantum simulations.

Contribution

The work develops on-the-fly matrix-free algorithms utilizing structured tensor contractions and a unified data layout, achieving substantial speedups over traditional FE-based DFT methods.

Findings

01

Achieves 1.5-4x speedup over cell-matrix approaches in pseudopotential DFT

02

Up to 5.8x performance gain in all-electron DFT calculations

03

Reduces end-to-end time-to-solution in large-scale DFT simulations

Abstract

Finite-element (FE) discretisations have emerged as a powerful real-space alternative to large-scale Kohn-Sham density functional theory (DFT) calculations, offering systematic convergence, excellent parallel scalability, while accommodating generic boundary conditions. However, the dominant computational bottleneck in FE-based DFT arises from the repeated application of the discretised sparse Hamiltonian to large blocks of trial vectors during iterations in an iterative eigensolver. Traditional sparse matrix-vector multiplications and FE cell-matrix approaches encounter memory limitations and high data-movement overheads, particularly at higher polynomial orders, typically used in DFT calculations. To overcome these challenges, this work develops matrix-free algorithms for FE-discretised DFT that substantially accelerate these products by doing on-the-fly operations that utilize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Matrix Theory and Algorithms · Model Reduction and Neural Networks