# Auto-tuning of dynamic scheduling applied to 3D reverse time migration   on multicore systems

**Authors:** \'Italo A. S. Assis, Jo\~ao B. Fernandes, Tiago Barros, Samuel, Xavier-de-Souza

arXiv: 1905.06975 · 2020-08-14

## TL;DR

This paper presents an auto-tuning strategy using coupled simulated annealing to optimize dynamic scheduling in 3D reverse time migration on multicore systems, significantly improving performance over default schedulers.

## Contribution

Introduces a run-time auto-tuning method for dynamic scheduling in RTM, enhancing execution speed by optimizing chunk size during parallel loop execution.

## Key findings

- Up to 33% faster execution compared to default OpenMP schedulers.
- Reduces cache misses, especially L3 cache.
- Low overhead of less than 2%.

## Abstract

Reverse time migration (RTM) is an algorithm widely used in the oil and gas industry to process seismic data. It is a computationally intensive task that suits well in parallel computers. Methods such as RTM can be parallelized in shared memory systems through scheduling iterations of parallel loops to threads. However, several aspects, such as memory size and hierarchy, number of cores, and input size, make optimal scheduling very challenging. In this paper, we introduce a run-time strategy to automatically tune the dynamic scheduling of parallel loops iterations in iterative applications, such as the RTM, in multicore systems. The proposed method aims to reduce the execution time of such applications. To find the optimal granularity, we propose a coupled simulated annealing (CSA) based auto-tuning strategy that adjusts the chunk size of work that OpenMP parallel loops assign dynamically to worker threads during the initialization of a 3D RTM application. Experiments performed with different computational systems and input sizes show that the proposed method is consistently better than the default OpenMP schedulers, static, auto, and guided, causing the application to be up to 33% faster. We show that the possible reason for this performance is the reduction of cache misses, mainly level L3, and low overhead, inferior to 2%. Having shown to be robust and scalable for the 3D RTM, the proposed method could also improve the performance of similar wave-based algorithms, such as full-waveform inversion (FWI) and other iterative applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.06975/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1905.06975/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/1905.06975/full.md

---
Source: https://tomesphere.com/paper/1905.06975