# A study of vectorization for matrix-free finite element methods

**Authors:** Tianjiao Sun, Lawrence Mitchell, Kaushik Kulkarni, Andreas Kl\"ockner,, David A. Ham, Paul H. J. Kelly

arXiv: 1903.08243 · 2020-08-26

## TL;DR

This paper explores cross-element vectorization in finite element methods within the Firedrake framework, demonstrating significant performance improvements on modern CPUs through code transformation techniques.

## Contribution

It introduces a code transformation approach for cross-element vectorization in finite element methods, achieving near-peak performance and surpassing intra-element vectorization efficiency.

## Key findings

- Achieves 30% of theoretical peak performance on practical examples.
- Exceeds 50% of peak performance for high arithmetic intensity cases.
- Provides consistent speed-up over intra-element vectorization.

## Abstract

Vectorization is increasingly important to achieve high performance on modern hardware with SIMD instructions. Assembly of matrices and vectors in the finite element method, which is characterized by iterating a local assembly kernel over unstructured meshes, poses difficulties to effective vectorization. Maintaining a user-friendly high-level interface with a suitable degree of abstraction while generating efficient, vectorized code for the finite element method is a challenge for numerical software systems and libraries. In this work, we study cross-element vectorization in the finite element framework Firedrake via code transformation and demonstrate the efficacy of such an approach by evaluating a wide range of matrix-free operators spanning different polynomial degrees and discretizations on two recent CPUs using three mainstream compilers. Our experiments show that our approaches for cross-element vectorization achieve 30\% of theoretical peak performance for many examples of practical significance, and exceed 50\% for cases with high arithmetic intensities, with consistent speed-up over (intra-element) vectorization restricted to the local assembly kernels.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.08243/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1903.08243/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1903.08243/full.md

---
Source: https://tomesphere.com/paper/1903.08243