Rotate the ReLU to implicitly sparsify deep networks

Nancy Nayak; Sheetal Kalyani

arXiv:2206.00488·cs.LG·June 2, 2022

Rotate the ReLU to implicitly sparsify deep networks

Nancy Nayak, Sheetal Kalyani

PDF

Open Access

TL;DR

This paper introduces a novel rotated ReLU activation that learns to implicitly sparsify deep neural networks by eliminating unnecessary filters, leading to more compact, efficient, and sometimes better-performing models.

Contribution

The paper proposes a learned rotation of ReLU activation functions that enables implicit sparsification and automatic filter pruning in deep networks.

Findings

01

Rotated ReLU effectively eliminates unnecessary filters.

02

Models with rotated ReLU show memory and computation savings.

03

In some cases, rotated ReLU improves baseline performance.

Abstract

In the era of Deep Neural Network based solutions for a variety of real-life tasks, having a compact and energy-efficient deployable model has become fairly important. Most of the existing deep architectures use Rectifier Linear Unit (ReLU) activation. In this paper, we propose a novel idea of rotating the ReLU activation to give one more degree of freedom to the architecture. We show that this activation wherein the rotation is learned via training results in the elimination of those parameters/filters in the network which are not important for the task. In other words, rotated ReLU seems to be doing implicit sparsification. The slopes of the rotated ReLU activations act as coarse feature extractors and unnecessary features can be eliminated before retraining. Our studies indicate that features always choose to pass through a lesser number of filters in architectures such as ResNet and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · CCD and CMOS Imaging Sensors

MethodsResidual Connection · Max Pooling · Average Pooling · Batch Normalization · Residual Block · 1x1 Convolution · Global Average Pooling · Bottleneck Residual Block · Kaiming Initialization · Convolution