Pushing the Limits of Online Auto-tuning: Machine Code Optimization in   Short-Running Kernels

Fernando Endo; Damien Courouss\'e; Henri-Pierre Charles

arXiv:1707.04566·cs.PF·July 17, 2017

Pushing the Limits of Online Auto-tuning: Machine Code Optimization in Short-Running Kernels

Fernando Endo, Damien Courouss\'e, Henri-Pierre Charles

PDF

TL;DR

This paper introduces an online auto-tuning method for short-running kernels that optimizes machine code directly, achieving significant speedups with minimal overhead in very brief applications.

Contribution

It presents a novel approach to auto-tuning at the machine code level for short-lived kernels, enabling effective optimization in applications lasting only seconds.

Findings

01

Average speedups of 1.10 to 1.58 in CPU-bound kernels

02

Up to 2.53 speedup in favorable conditions

03

Overhead of 0.2% to 4.2% of total execution time

Abstract

We propose an online auto-tuning approach for computing kernels. Differently from existing online auto-tuners, which regenerate code with long compilation chains from the source to the binary code, our approach consists on deploying auto-tuning directly at the level of machine code generation. This allows auto-tuning to pay off in very short-running applications. As a proof of concept, our approach is demonstrated in two benchmarks, which execute during hundreds of milliseconds to a few seconds only. In a CPU-bound kernel, the average speedups achieved are 1.10 to 1.58 depending on the target micro-architecture, up to 2.53 in the most favourable conditions (all run-time overheads included). In a memory-bound kernel, less favourable to our runtime auto-tuning optimizations, the average speedups are 1.04 to 1.10, up to 1.30 in the best configuration. Despite the short execution times of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.