Vertical Layering of Quantized Neural Networks for Heterogeneous Inference
Hai Wu, Ruifei He, Haoru Tan, Xiaojuan Qi, Kaibin Huang

TL;DR
This paper introduces a vertical-layered neural network weight representation enabling a single model to support multiple quantization precisions, reducing training and maintenance costs for heterogeneous device deployment.
Contribution
It proposes a novel vertical-layered weight representation and a once QAT scheme that encapsulate multiple quantized models into one, allowing on-demand precision adjustment.
Findings
Achieves comparable performance to dedicated quantized models.
Enables one-time training for multiple quantization levels.
Reduces costs in model training and maintenance for heterogeneous devices.
Abstract
Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one specific hardware setting, incurring considerable costs in model training and maintenance. In this paper, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. With this representation, we can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model. To this end, we propose a simple once quantization-aware training (QAT) scheme for obtaining high-performance vertical-layered models. Our design incorporates a cascade downsampling mechanism which allows us to obtain multiple quantized networks from one full precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
Methodstravel james
