Automatic acceleration of Numpy applications on GPUs and multicore CPUs

Mahesh Ravishankar; Vinod Grover

arXiv:1901.03771·cs.PL·January 15, 2019·1 cites

Automatic acceleration of Numpy applications on GPUs and multicore CPUs

Mahesh Ravishankar, Vinod Grover

PDF

Open Access

TL;DR

This paper presents a method to accelerate Numpy applications on GPUs and multicore CPUs by recording and deferring execution of operations, reducing memory use, and improving performance significantly.

Contribution

It introduces a deferred execution framework for Numpy that optimizes performance and enables seamless targeting of GPUs and multicore CPUs without changing user code.

Findings

01

Order of magnitude performance improvement over standard Numpy

02

Reduced memory footprint and bandwidth requirements

03

Effective acceleration on both GPUs and multicore CPUs

Abstract

Frameworks like Numpy are a popular choice for application developers from varied fields such as image processing to bio-informatics to machine learning. Numpy is often used to develop prototypes or for deployment since it provides efficient implementation for operations involving arrays. Such an approach requires every operation to be executed eagerly. The result of each operation needs to be stored in memory which increases the memory footprint of the application. It also increases the bandwidth requirements since all uses must read from this memory. We propose an approach that records the sequence of Numpy operations for defered execution. When the values of an array are needed, for example when the values are stored to disk or displayed on screen, the sequence of operations required to compute these value are compiled into a function and executed. This removes the need to store/load…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · CCD and CMOS Imaging Sensors · Advanced Data Storage Technologies