Generalized Activation via Multivariate Projection
Jiayun Li, Yuxiao Cheng, Yiwen Lu, Zhuofan Xia, Yilin Mo, Gao Huang

TL;DR
This paper introduces a new multivariate activation function called the Multivariate Projection Unit (MPU), extending ReLU by using convex cone projections, which enhances the expressive power of neural networks and improves performance.
Contribution
It proposes a novel multivariate activation function based on convex cone projections, generalizing ReLU, and demonstrates its superior expressive power and empirical effectiveness.
Findings
MPU outperforms ReLU in expressive power.
Experimental results show MPU's effectiveness across architectures.
MPU provides a natural extension of ReLU via convex cone projections.
Abstract
Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness. Motivated by the structural similarity between a shallow Feedforward Neural Network (FNN) and a single iteration of the Projected Gradient Descent (PGD) algorithm, a standard approach for solving constrained optimization problems, we consider ReLU as a projection from R onto the nonnegative half-line R+. Building on this interpretation, we extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection, thereby naturally extending it to a Multivariate Projection Unit (MPU), an activation function with multiple inputs and multiple outputs. We further provide mathematical proof establishing that FNNs activated by SOC projections outperform those…
Peer Reviews
Decision·Submitted to ICLR 2024
Overall, I liked the paper. It was very fun reading. In particular, the following are the main strengths: - Very well written paper. - Clear motivation in what they want to achieve (MIMO activation) and in how they achieve it (based on projections/proximal functions). - Supporting theory. - Empirical results suggesting the benefits of their approach.
However, there are some very substantial weaknesses: - *[Removed due to answers by authors:]* Theory is basic and simple. But more importantly: it does not really, at the essence, establish why they function should give better results. Sure, you can set the weights just correctly so that a layer is like an iteration. But, so what? The weights are actually learned, and our goal is not to really learn an iteration... - *[Removed due to answers by authors:]* I am a bit skeptical about the motivat
1. The idea that choose the activation function to be the projection onto the convex cone is very interesting. 2. The paper provides some theoretical proofs. 3. Some experiments demonstrate the effectiveness of the proposed MIMO activation function.
1. The organization and writing of the paper need to be improved. 2. The proposed Theorem 1 is not so rigorous. 3. The experiments are not enough.
To the best of my knowledge, the MPU is novel, and it is an interesting variation of a standard ReLU with a good underlying motivation. The paper is well written, in particular the visualizations in Fig. 1 immediately show the basic idea of the paper. I am less convinced about the empirical evaluation (see below), so the practical value of the MPU is not clear.
I have a few general comments on the manuscript, making this (at the moment) a borderline paper for acceptance. I think most of these questions are addressable and I would be happy to increase my score. EXPOSITION: I have found the exposition of the paper a bit strange, because there is a very long motivation for the MPU (both Section 2.1 and Section 3 serve as a motivation), but very little analysis of the MPU itself. For example: (i) there is no explicit definition of the MPU or the MPU layer
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Machine Learning and ELM
