TL;DR
NestQuant introduces a resource-efficient post-training quantization method for IoT devices, enabling dynamic model switching with minimal storage and switching overheads by integer weight nesting and adaptive weight decomposition.
Contribution
It proposes a novel integer-nesting quantization technique that allows on-device model switching without retraining or multiple models, reducing resource consumption and overheads.
Findings
Achieves high accuracy with nested quantized models on ImageNet.
Reduces switching overheads by approximately 78%.
Enables resource-adaptive model deployment on IoT devices.
Abstract
Deploying quantized deep neural network (DNN) models with resource adaptation capabilities on ubiquitous Internet of Things (IoT) devices to provide high-quality AI services can leverage the benefits of compression and meet multi-scenario resource requirements. However, existing dynamic/mixed precision quantization requires retraining or special hardware, whereas post-training quantization (PTQ) has two limitations for resource adaptation: (i) The state-of-the-art PTQ methods only provide one fixed bitwidth model, which makes it challenging to adapt to the dynamic resources of IoT devices; (ii) Deploying multiple PTQ models with diverse bitwidths consumes large storage resources and switching overheads. To this end, this paper introduces a resource-friendly post-training integer-nesting quantization, i.e., NestQuant, for on-device quantized model switching on IoT devices. The proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsNesT
