AMED: Automatic Mixed-Precision Quantization for Edge Devices

Moshe Kimhi; Tal Rozen; Avi Mendelson; Chaim Baskin

arXiv:2205.15437·cs.LG·June 11, 2024

AMED: Automatic Mixed-Precision Quantization for Edge Devices

Moshe Kimhi, Tal Rozen, Avi Mendelson, Chaim Baskin

PDF

Open Access 1 Repo

TL;DR

AMED introduces a novel approach to neural network quantization by modeling bitwidth allocation as a Markov Decision Process, optimizing for hardware-specific performance during training.

Contribution

It proposes a new method that treats mixed-precision quantization as a dynamic process, improving accuracy and efficiency on edge devices compared to existing techniques.

Findings

01

Outperforms state-of-the-art quantization schemes in accuracy-efficiency trade-offs.

02

Adapts bitwidth allocation dynamically based on hardware signals.

03

Demonstrates significant improvements on edge device benchmarks.

Abstract

Quantized neural networks are well known for reducing the latency, power consumption, and model size without significant harm to the performance. This makes them highly appropriate for systems with limited resources and low power capacity. Mixed-precision quantization offers better utilization of customized hardware that supports arithmetic operations at different bitwidths. Quantization methods either aim to minimize the compression loss given a desired reduction or optimize a dependent variable for a specified property of the model (such as FLOPs or model size); both make the performance inefficient when deployed on specific hardware, but more importantly, quantization methods assume that the loss manifold holds a global minimum for a quantized model that copes with the global minimum of the full precision counterpart. Challenging this assumption, we argue that the optimal minimum…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ramoraydrake/amed
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Imaging Techniques and Applications · Brain Tumor Detection and Classification