Automated Parallel Kernel Extraction from Dynamic Application Traces

Richard Uhrie; Chaitali Chakrabarti; John Brunhaver

arXiv:2001.09995·cs.DC·January 31, 2020·5 cites

Automated Parallel Kernel Extraction from Dynamic Application Traces

Richard Uhrie, Chaitali Chakrabarti, John Brunhaver

PDF

Open Access 1 Repo

TL;DR

This paper presents an efficient method for automatically extracting parallel kernels from dynamic application traces, enabling easier hardware acceleration and optimization without extensive manual annotation.

Contribution

It introduces a fast, accurate, and scalable technique for localizing kernels from traces, validated across multiple libraries and test programs.

Findings

01

Trace collection is fast and compact, with minimal overhead.

02

Kernel detection is accurate and runs in linear time.

03

Validated on 16 libraries with over 10,000 kernel instances.

Abstract

Modern program runtime is dominated by segments of repeating code called kernels. Kernels are accelerated by increasing memory locality, increasing data-parallelism, and exploiting producer-consumer parallelism among kernels - which requires hardware specialized for a particular class of kernels. Programming this hardware can be difficult, requiring that the kernels be identified and annotated in the code or translated to a domain-specific language. This paper describes a technique to automatically localize parallel kernels from a dynamic application trace, facilitating further code optimization. Dynamic trace collection is fast and compact. With optimization, it only incurs a time-dilation of a factor on nine and file-size of one megabyte per second, addressing a significant criticism of this approach. Kernel extraction is accurate and performed in linear time within logarithmic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruhrie/TraceAtlas
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Embedded Systems Design Techniques