Ternary Residual Networks
Abhisek Kundu, Kunal Banerjee, Naveen Mellempudi, Dheevatsa Mudigere,, Dipankar Das, Bharat Kaul, Pradeep Dubey

TL;DR
This paper introduces ternary residual networks that add low-precision residual edges to sub-8-bit deep neural networks, significantly improving accuracy and efficiency, and enabling dynamic model adjustments without retraining.
Contribution
It proposes a novel residual network approach with ternary weights, guided by perturbation theory, to mitigate accuracy loss in ultra-low-precision DNNs, achieving high accuracy with reduced computation.
Findings
Achieves ~1% accuracy drop with 8-2 model on ResNet-101.
Reduces model size by ~1.6x and computations by ~26x.
Enables on-the-fly accuracy-performance trade-offs.
Abstract
Sub-8-bit representation of DNNs incur some discernible loss of accuracy despite rigorous (re)training at low-precision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of \textit{residual networks} where we add more low-precision edges to sensitive branches of the sub-8-bit network to compensate for the lost accuracy. Further, we present a perturbation theory to identify such sensitive edges. Aided by such an elegant trade-off between accuracy and compute, the 8-2 model (8-bit activations, ternary weights), enhanced by ternary residual edges, turns out to be sophisticated enough to achieve very high accuracy ( drop from our FP-32 baseline), despite reduction in model size, reduction in number…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
