AMED: Automatic Mixed-Precision Quantization for Edge Devices
Moshe Kimhi, Tal Rozen, Avi Mendelson, Chaim Baskin

TL;DR
AMED introduces a novel approach to neural network quantization by modeling bitwidth allocation as a Markov Decision Process, optimizing for hardware-specific performance during training.
Contribution
It proposes a new method that treats mixed-precision quantization as a dynamic process, improving accuracy and efficiency on edge devices compared to existing techniques.
Findings
Outperforms state-of-the-art quantization schemes in accuracy-efficiency trade-offs.
Adapts bitwidth allocation dynamically based on hardware signals.
Demonstrates significant improvements on edge device benchmarks.
Abstract
Quantized neural networks are well known for reducing the latency, power consumption, and model size without significant harm to the performance. This makes them highly appropriate for systems with limited resources and low power capacity. Mixed-precision quantization offers better utilization of customized hardware that supports arithmetic operations at different bitwidths. Quantization methods either aim to minimize the compression loss given a desired reduction or optimize a dependent variable for a specified property of the model (such as FLOPs or model size); both make the performance inefficient when deployed on specific hardware, but more importantly, quantization methods assume that the loss manifold holds a global minimum for a quantized model that copes with the global minimum of the full precision counterpart. Challenging this assumption, we argue that the optimal minimum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Imaging Techniques and Applications · Brain Tumor Detection and Classification
