APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU   Tensor Cores

Boyuan Feng; Yuke Wang; Tong Geng; Ang Li; Yufei Ding

arXiv:2106.12169·cs.DC·November 18, 2021·1 cites

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding

PDF

Open Access 1 Repo

TL;DR

APNN-TC introduces a novel framework that enables arbitrary precision neural network computations on Ampere GPU Tensor Cores, overcoming previous precision limitations and significantly accelerating neural network inference.

Contribution

It presents the first emulation algorithm and layer design for arbitrary precision neural networks on GPU Tensor Cores, expanding support beyond limited precisions.

Findings

01

Achieves significant speedup over existing kernels and models

02

Supports arbitrary short bit-width computation with primitive operations

03

Optimizes memory access and batching for improved performance

Abstract

Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g., int1 and int4). To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantization benefits on Ampere GPU Tensor Cores. Specifically, APNN-TC first incorporates a novel emulation algorithm to support arbitrary short bit-width computation with int1 compute primitives and XOR/AND Boolean operations. Second, APNN-TC integrates arbitrary precision layer designs to efficiently map our emulation algorithm to Tensor Cores with novel batching strategies and specialized memory organization. Third, APNN-TC embodies a novel arbitrary precision NN design to minimize memory access…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BoyuanFeng/APNN-TC
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · Average Pooling · Global Average Pooling · 1x1 Convolution · Bottleneck Residual Block · Kaiming Initialization · Residual Block · Max Pooling