Rotate the ReLU to implicitly sparsify deep networks
Nancy Nayak, Sheetal Kalyani

TL;DR
This paper introduces a novel rotated ReLU activation that learns to implicitly sparsify deep neural networks by eliminating unnecessary filters, leading to more compact, efficient, and sometimes better-performing models.
Contribution
The paper proposes a learned rotation of ReLU activation functions that enables implicit sparsification and automatic filter pruning in deep networks.
Findings
Rotated ReLU effectively eliminates unnecessary filters.
Models with rotated ReLU show memory and computation savings.
In some cases, rotated ReLU improves baseline performance.
Abstract
In the era of Deep Neural Network based solutions for a variety of real-life tasks, having a compact and energy-efficient deployable model has become fairly important. Most of the existing deep architectures use Rectifier Linear Unit (ReLU) activation. In this paper, we propose a novel idea of rotating the ReLU activation to give one more degree of freedom to the architecture. We show that this activation wherein the rotation is learned via training results in the elimination of those parameters/filters in the network which are not important for the task. In other words, rotated ReLU seems to be doing implicit sparsification. The slopes of the rotated ReLU activations act as coarse feature extractors and unnecessary features can be eliminated before retraining. Our studies indicate that features always choose to pass through a lesser number of filters in architectures such as ResNet and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · CCD and CMOS Imaging Sensors
MethodsResidual Connection · Max Pooling · Average Pooling · Batch Normalization · Residual Block · 1x1 Convolution · Global Average Pooling · Bottleneck Residual Block · Kaiming Initialization · Convolution
