Generalized Activation via Multivariate Projection

Jiayun Li; Yuxiao Cheng; Yiwen Lu; Zhuofan Xia; Yilin Mo; Gao Huang

arXiv:2309.17194·cs.LG·January 30, 2024

Generalized Activation via Multivariate Projection

Jiayun Li, Yuxiao Cheng, Yiwen Lu, Zhuofan Xia, Yilin Mo, Gao Huang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a new multivariate activation function called the Multivariate Projection Unit (MPU), extending ReLU by using convex cone projections, which enhances the expressive power of neural networks and improves performance.

Contribution

It proposes a novel multivariate activation function based on convex cone projections, generalizing ReLU, and demonstrates its superior expressive power and empirical effectiveness.

Findings

01

MPU outperforms ReLU in expressive power.

02

Experimental results show MPU's effectiveness across architectures.

03

MPU provides a natural extension of ReLU via convex cone projections.

Abstract

Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness. Motivated by the structural similarity between a shallow Feedforward Neural Network (FNN) and a single iteration of the Projected Gradient Descent (PGD) algorithm, a standard approach for solving constrained optimization problems, we consider ReLU as a projection from R onto the nonnegative half-line R+. Building on this interpretation, we extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection, thereby naturally extending it to a Multivariate Projection Unit (MPU), an activation function with multiple inputs and multiple outputs. We further provide mathematical proof establishing that FNNs activated by SOC projections outperform those…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

Overall, I liked the paper. It was very fun reading. In particular, the following are the main strengths: - Very well written paper. - Clear motivation in what they want to achieve (MIMO activation) and in how they achieve it (based on projections/proximal functions). - Supporting theory. - Empirical results suggesting the benefits of their approach.

Weaknesses

However, there are some very substantial weaknesses: - *[Removed due to answers by authors:]* Theory is basic and simple. But more importantly: it does not really, at the essence, establish why they function should give better results. Sure, you can set the weights just correctly so that a layer is like an iteration. But, so what? The weights are actually learned, and our goal is not to really learn an iteration... - *[Removed due to answers by authors:]* I am a bit skeptical about the motivat

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. The idea that choose the activation function to be the projection onto the convex cone is very interesting. 2. The paper provides some theoretical proofs. 3. Some experiments demonstrate the effectiveness of the proposed MIMO activation function.

Weaknesses

1. The organization and writing of the paper need to be improved. 2. The proposed Theorem 1 is not so rigorous. 3. The experiments are not enough.

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

To the best of my knowledge, the MPU is novel, and it is an interesting variation of a standard ReLU with a good underlying motivation. The paper is well written, in particular the visualizations in Fig. 1 immediately show the basic idea of the paper. I am less convinced about the empirical evaluation (see below), so the practical value of the MPU is not clear.

Weaknesses

I have a few general comments on the manuscript, making this (at the moment) a borderline paper for acceptance. I think most of these questions are addressable and I would be happy to increase my score. EXPOSITION: I have found the exposition of the paper a bit strange, because there is a very long motivation for the MPU (both Section 2.1 and Section 3 serve as a motivation), but very little analysis of the MPU itself. For example: (i) there is no explicit definition of the MPU or the MPU layer

Code & Models

Repositories

ljy9912/mimo_nn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Machine Learning and ELM