MPI+X: task-based parallelization and dynamic load balance of finite element assembly
Marta Garcia-Gasulla, Guillaume Houzeaux, Roger Ferrer, Antoni, Artigues, Victor L\'opez, Jes\'us Labarta, Mariano V\'azquez

TL;DR
This paper introduces a task-based parallelization approach for finite element assembly in MPI+X environments, enhancing load balancing and efficiency for large-scale PDE solutions.
Contribution
It proposes a novel task parallelism strategy for element loop assembly using OpenMP extensions, improving load balancing and locality in hybrid MPI+X systems.
Findings
Achieved efficient parallel assembly on up to 16,000 cores.
Demonstrated improved load balancing with the DLB library.
Validated approach on large computational mechanics problems.
Abstract
The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy can be straightforwardly applied to other discretization methods, like the finite volume method. The matrix assembly consists of a loop over the elements of the MPI partition to compute element matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strategies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring techniques to circumvent the race condition that appears when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
