The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal   Padding

Pratik Fegade; Tianqi Chen; Phillip B. Gibbons; Todd C. Mowry

arXiv:2110.10221·cs.LG·March 23, 2022·5 cites

The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding

Pratik Fegade, Tianqi Chen, Phillip B. Gibbons, Todd C. Mowry

PDF

Open Access

TL;DR

CoRa is a tensor compiler designed to efficiently handle ragged tensors in deep learning, reducing wasted computation and improving performance on CPUs and GPUs compared to traditional padding methods.

Contribution

The paper introduces CoRa, a novel compiler that generates optimized code for ragged tensor operations, enabling efficient execution without extensive padding.

Findings

01

CoRa performs competitively with hand-optimized implementations.

02

Achieves 1.6X speedup over PyTorch on Nvidia GPU.

03

Achieves 1.86X speedup on ARM CPU for transformer encoder.

Abstract

There is often variation in the shape and size of input data used for deep learning. In many cases, such data can be represented using tensors with non-uniform shapes, or ragged tensors. Due to limited and non-portable support for efficient execution on ragged tensors, current deep learning frameworks generally use techniques such as padding and masking to make the data shapes uniform and then offload the computations to optimized kernels for dense tensor algebra. Such techniques can, however, lead to a lot of wasted computation and therefore, a loss in performance. This paper presents CoRa, a tensor compiler that allows users to easily generate efficient code for ragged tensor operators targeting a wide range of CPUs and GPUs. Evaluating CoRa on a variety of operators on ragged tensors as well as on an encoder layer of the transformer model, we find that CoRa (i)performs competitively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Advanced Neural Network Applications

MethodsSoftmax · Linear Layer