APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores
Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding

TL;DR
APNN-TC introduces a novel framework that enables arbitrary precision neural network computations on Ampere GPU Tensor Cores, overcoming previous precision limitations and significantly accelerating neural network inference.
Contribution
It presents the first emulation algorithm and layer design for arbitrary precision neural networks on GPU Tensor Cores, expanding support beyond limited precisions.
Findings
Achieves significant speedup over existing kernels and models
Supports arbitrary short bit-width computation with primitive operations
Optimizes memory access and batching for improved performance
Abstract
Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g., int1 and int4). To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantization benefits on Ampere GPU Tensor Cores. Specifically, APNN-TC first incorporates a novel emulation algorithm to support arbitrary short bit-width computation with int1 compute primitives and XOR/AND Boolean operations. Second, APNN-TC integrates arbitrary precision layer designs to efficiently map our emulation algorithm to Tensor Cores with novel batching strategies and specialized memory organization. Third, APNN-TC embodies a novel arbitrary precision NN design to minimize memory access…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · Average Pooling · Global Average Pooling · 1x1 Convolution · Bottleneck Residual Block · Kaiming Initialization · Residual Block · Max Pooling
