TL;DR
This paper introduces a dual-precision deep neural network that enables on-line switching between precision modes without re-training, balancing accuracy and complexity during inference.
Contribution
It proposes a novel dual-precision DNN architecture with a two-phase training process for simultaneous optimization of both precision modes.
Findings
Supports on-line precision switching without re-training
Optimizes both low- and high-precision modes effectively
Enhances inference flexibility and efficiency
Abstract
On-line Precision scalability of the deep neural networks(DNNs) is a critical feature to support accuracy and complexity trade-off during the DNN inference. In this paper, we propose dual-precision DNN that includes two different precision modes in a single model, thereby supporting an on-line precision switch without re-training. The proposed two-phase training process optimizes both low- and high-precision modes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
