High-performance generation of the Hamiltonian and Overlap matrices in   FLAPW methods

Edoardo Di Napoli (1; 4); Elmar Peise (2); Markus Hrywniak (3),; Paolo Bientinesi (2) ((1) J\"ulich Supercomputing Centre; (2) AICES; RWTH; Aachen University; (3) GRS; RWTH Aachen University; (4) J\"ulich Aachen; Research Alliance -- High-performance Computing)

arXiv:1602.06589·cs.CE·January 17, 2018

High-performance generation of the Hamiltonian and Overlap matrices in FLAPW methods

Edoardo Di Napoli (1, 4), Elmar Peise (2), Markus Hrywniak (3),, Paolo Bientinesi (2) ((1) J\"ulich Supercomputing Centre, (2) AICES, RWTH, Aachen University, (3) GRS, RWTH Aachen University, (4) J\"ulich Aachen, Research Alliance -- High-performance Computing)

PDF

TL;DR

This paper presents a methodology to optimize the most computationally expensive parts of FLAPW codes by restructuring them with dense linear algebra kernels, significantly improving performance and scalability.

Contribution

It introduces a novel approach to reformulate key operations in FLAPW methods using optimized dense linear algebra, enhancing code performance and longevity.

Findings

01

Achieved increased computational performance in FLAPW codes.

02

Enabled larger scale materials science simulations.

03

Extended the usability of legacy codes.

Abstract

One of the greatest efforts of computational scientists is to translate the mathematical model describing a class of physical phenomena into large and complex codes. Many of these codes face the difficulty of implementing the mathematical operations in the model in terms of low level optimized kernels offering both performance and portability. Legacy codes suffer from the additional curse of rigid design choices based on outdated performance metrics (e.g. minimization of memory footprint). Using a representative code from the Materials Science community, we propose a methodology to restructure the most expensive operations in terms of an optimized combination of dense linear algebra kernels. The resulting algorithm guarantees an increased performance and an extended life span of this code enabling larger scale simulations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.