# Ultra-low Precision Multiplication-free Training for Deep Neural   Networks

**Authors:** Chang Liu, Rui Zhang, Xishan Zhang, Yifan Hao, Zidong Du, Xing Hu,, Ling Li, Qi Guo

arXiv: 2302.14458 · 2023-03-01

## TL;DR

This paper introduces an ultra-low precision, multiplication-free training method for deep neural networks that drastically reduces energy consumption while maintaining high accuracy.

## Contribution

It proposes a novel multiplication-free training scheme using INT4 additions and 1-bit XOR operations, achieving up to 95.8% energy savings with minimal accuracy loss.

## Key findings

- Reduces energy consumption by up to 95.8% in linear layers.
- Maintains less than 1% accuracy degradation on ImageNet and WMT En-De tasks.
- Outperforms existing energy-efficient training methods in both efficiency and accuracy.

## Abstract

The training for deep neural networks (DNNs) demands immense energy consumption, which restricts the development of deep learning as well as increases carbon emissions. Thus, the study of energy-efficient training for DNNs is essential. In training, the linear layers consume the most energy because of the intense use of energy-consuming full-precision (FP32) multiplication in multiply-accumulate (MAC). The energy-efficient works try to decrease the precision of multiplication or replace the multiplication with energy-efficient operations such as addition or bitwise shift, to reduce the energy consumption of FP32 multiplications. However, the existing energy-efficient works cannot replace all of the FP32 multiplications during both forward and backward propagation with low-precision energy-efficient operations. In this work, we propose an Adaptive Layer-wise Scaling PoT Quantization (ALS-POTQ) method and a Multiplication-Free MAC (MF-MAC) to replace all of the FP32 multiplications with the INT4 additions and 1-bit XOR operations. In addition, we propose Weight Bias Correction and Parameterized Ratio Clipping techniques for stable training and improving accuracy. In our training scheme, all of the above methods do not introduce extra multiplications, so we reduce up to 95.8% of the energy consumption in linear layers during training. Experimentally, we achieve an accuracy degradation of less than 1% for CNN models on ImageNet and Transformer model on the WMT En-De task. In summary, we significantly outperform the existing methods for both energy efficiency and accuracy.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14458/full.md

## Figures

26 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14458/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/2302.14458/full.md

---
Source: https://tomesphere.com/paper/2302.14458