Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
Sangeetha Siddegowda, Marios Fournarakis, Markus Nagel, Tijmen, Blankevoort, Chirag Patel, Abhijit Khobare

TL;DR
This paper introduces AIMET, a toolkit that simplifies neural network quantization, enabling low-latency, energy-efficient inference on edge devices with minimal accuracy loss.
Contribution
It presents AIMET as a comprehensive library offering state-of-the-art quantization and compression algorithms for easy model optimization in PyTorch and TensorFlow.
Findings
AIMET supports various post-training quantization techniques.
AIMET enables quantization-aware training with near floating-point accuracy.
Practical workflows and code examples facilitate model quantization.
Abstract
While neural networks have advanced the frontiers in many machine learning applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is vital to integrating modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings, but the additional noise it induces can lead to accuracy degradation. In this white paper, we present an overview of neural network quantization using AI Model Efficiency Toolkit (AIMET). AIMET is a library of state-of-the-art quantization and compression algorithms designed to ease the effort required for model optimization and thus drive the broader AI ecosystem towards low latency and energy-efficient inference. AIMET provides users with the ability to simulate as well as optimize PyTorch and TensorFlow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fault Detection and Control Systems · Advanced Neural Network Applications
