Finite element numerical integration for first order approximations on   multi-core architectures

Krzysztof Bana\'s; Filip Kru\.zel; Jan Biela\'nski

arXiv:1504.01023·cs.MS·May 25, 2016

Finite element numerical integration for first order approximations on multi-core architectures

Krzysztof Bana\'s, Filip Kru\.zel, Jan Biela\'nski

PDF

TL;DR

This paper investigates the implementation and performance of finite element numerical integration for first order approximations across CPU, Xeon Phi, and GPU architectures using a unified OpenCL model, demonstrating effective portability and optimization strategies.

Contribution

It introduces a portable OpenCL implementation for finite element integration across multiple architectures and provides performance models and optimization insights.

Findings

01

Effective porting of the algorithm to all tested architectures.

02

Performance varies across architectures but remains sufficient for practical use.

03

Optimization strategies improve kernel performance and mapping.

Abstract

The paper presents investigations on the implementation and performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical CPU, Intel Xeon Phi and NVIDIA Kepler GPU. A unifying programming model and portable OpenCL implementation is considered for all architectures. Variations of the algorithm due to different problems solved and different element types are investigated and several optimizations aimed at proper optimization and mapping of the algorithm to computer architectures are demonstrated. Performance models of execution are developed for different processors and tested in practical experiments. The results show the varying levels of performance for different architectures, but indicate that the algorithm can be effectively ported to all of them. The general conclusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.