Loss-aware Binarization of Deep Networks
Lu Hou, Quanming Yao, James T. Kwok

TL;DR
This paper introduces a loss-aware binarization method for deep neural networks that directly minimizes the loss, resulting in more efficient and robust binarized models compared to previous schemes.
Contribution
It proposes a proximal Newton algorithm with diagonal Hessian approximation for loss-aware binarization, improving over simple matrix approximation methods.
Findings
Outperforms existing binarization schemes in accuracy.
More robust for wide and deep networks.
Efficient second-order information utilization.
Abstract
Deep neural network models, though very powerful and highly successful, are computationally expensive in terms of space and time. Recently, there have been a number of attempts on binarizing the network weights and activations. This greatly reduces the network size, and replaces the underlying multiplications to additions or even XNOR bit operations. However, existing binarization schemes are based on simple matrix approximation and ignore the effect of binarization on the loss. In this paper, we propose a proximal Newton algorithm with diagonal Hessian approximation that directly minimizes the loss w.r.t. the binarized weights. The underlying proximal step has an efficient closed-form solution, and the second-order information can be efficiently obtained from the second moments already computed by the Adam optimizer. Experiments on both feedforward and recurrent networks show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Brain Tumor Detection and Classification · Neural Networks and Applications
MethodsAdam
