MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI
Zijun Jiang, Yangdi Lyu

TL;DR
MiCo is an end-to-end framework that optimizes mixed-precision quantization schemes for neural networks on edge devices, balancing accuracy and latency for efficient deployment.
Contribution
The paper introduces MiCo, a novel framework that efficiently explores and deploys mixed-precision quantized neural networks tailored for edge AI hardware.
Findings
Optimizes quantization schemes with high accuracy under latency constraints.
Builds hardware-aware latency models for fast exploration.
Enables direct deployment from PyTorch to C with minimal accuracy loss.
Abstract
Quantized Neural Networks (QNN) with extremely low-bitwidth data have proven promising in efficient storage and computation on edge devices. To further reduce the accuracy drop while increasing speedup, layer-wise mixed-precision quantization (MPQ) becomes a popular solution. However, existing algorithms for exploring MPQ schemes are limited in flexibility and efficiency. Comprehending the complex impacts of different MPQ schemes on post-training quantization and quantization-aware training results is a challenge for conventional methods. Furthermore, an end-to-end framework for the optimization and deployment of MPQ models is missing in existing work. In this paper, we propose the MiCo framework, a holistic MPQ exploration and deployment framework for edge AI applications. The framework adopts a novel optimization algorithm to search for optimal quantization schemes with the highest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
