Linear Oscillation: A Novel Activation Function for Vision Transformer

Juyoung Yun

arXiv:2308.13670·cs.LG·December 4, 2023

Linear Oscillation: A Novel Activation Function for Vision Transformer

Juyoung Yun

PDF

Open Access

TL;DR

The paper introduces Linear Oscillation (LoC), a novel activation function blending linearity with oscillations, which improves neural network performance, especially in Vision Transformers, by fostering robust learning through controlled confusion.

Contribution

It proposes the LoC activation function that combines linear and oscillatory behaviors, demonstrating its superiority over traditional functions in vision transformer models.

Findings

01

LoC outperforms ReLU and Sigmoid in various architectures.

02

Integration of LoC enhances Vision Transformer performance.

03

Controlled confusion via LoC promotes more nuanced learning.

Abstract

Activation functions are the linchpins of deep learning, profoundly influencing both the representational capacity and training dynamics of neural networks. They shape not only the nature of representations but also optimize convergence rates and enhance generalization potential. Appreciating this critical role, we present the Linear Oscillation (LoC) activation function, defined as $f (x) = x \times sin (α x + β)$ . Distinct from conventional activation functions which primarily introduce non-linearity, LoC seamlessly blends linear trajectories with oscillatory deviations. The nomenclature "Linear Oscillation" is a nod to its unique attribute of infusing linear activations with harmonious oscillations, capturing the essence of the "Importance of Confusion". This concept of "controlled confusion" within network activations is posited to foster more robust learning, particularly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · CCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Layer Normalization · Dropout · Byte Pair Encoding · Adam · Position-Wise Feed-Forward Layer