# Algorithms and data structures for matrix-free finite element operators   with MPI-parallel sparse multi-vectors

**Authors:** Denis Davydov, Martin Kronbichler

arXiv: 1907.01005 · 2019-07-03

## TL;DR

This paper presents an efficient, matrix-free, parallel implementation of finite element operators with sparse multivectors in deal.II, enabling scalable quantum mechanical simulations with improved performance.

## Contribution

It introduces novel algorithms and data structures for matrix-free sparse multivector operations within deal.II, optimizing performance and scalability for quantum mechanical finite element problems.

## Key findings

- Achieves 157 GFlop/s performance on Intel Cascade Lake.
- Demonstrates strong and weak scaling on benchmark problems.
- Provides efficient algorithms for sparse multivector operations.

## Abstract

Traditional solution approaches for problems in quantum mechanics scale as $\mathcal O(M^3)$, where $M$ is the number of electrons. Various methods have been proposed to address this issue and obtain linear scaling $\mathcal O(M)$. One promising formulation is the direct minimization of energy. Such methods take advantage of physical localization of the solution, namely that the solution can be sought in terms of non-orthogonal orbitals with local support. In this work a numerically efficient implementation of sparse parallel vectors within the open-source finite element library deal.II is proposed. The main algorithmic ingredient is the matrix-free evaluation of the Hamiltonian operator by cell-wise quadrature. Based on an a-priori chosen support for each vector we develop algorithms and data structures to perform (i) matrix-free sparse matrix multivector products (SpMM), (ii) the projection of an operator onto a sparse sub-space (inner products), and (iii) post-multiplication of a sparse multivector with a square matrix. The node-level performance is analyzed using a roofline model. Our matrix-free implementation of finite element operators with sparse multivectors achieves the performance of 157 GFlop/s on Intel Cascade Lake architecture. Strong and weak scaling results are reported for a typical benchmark problem using quadratic and quartic finite element bases.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.01005/full.md

## Figures

42 figures with captions in the complete paper: https://tomesphere.com/paper/1907.01005/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/1907.01005/full.md

---
Source: https://tomesphere.com/paper/1907.01005