Attention as Activation
Yimian Dai, Stefan Oehmcke, Fabian Gieseke, Yiquan Wu and, Kobus Barnard

TL;DR
This paper introduces attentional activation (ATAC) units that unify activation functions and attention mechanisms, leading to improved neural network performance with modest additional parameters.
Contribution
The paper proposes a novel attentional activation unit that combines activation and attention, and demonstrates its effectiveness across multiple datasets and network architectures.
Findings
ATAC units improve network performance over traditional activations.
Networks with ATAC units outperform competitors with similar parameter counts.
Empirical validation on CIFAR-10, CIFAR-100, and ImageNet datasets confirms effectiveness.
Abstract
Activation functions and attention mechanisms are typically treated as having different purposes and have evolved differently. However, both concepts can be formulated as a non-linear gating function. Inspired by their similarity, we propose a novel type of activation units called attentional activation (ATAC) units as a unification of activation functions and attention mechanisms. In particular, we propose a local channel attention module for the simultaneous non-linear activation and element-wise feature refinement, which locally aggregates point-wise cross-channel feature contexts. By replacing the well-known rectified linear units by such ATAC units in convolutional networks, we can construct fully attentional networks that perform significantly better with a modest number of additional parameters. We conducted detailed ablation studies on the ATAC units using several host networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsMax Pooling · Sigmoid Activation · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Average Pooling · How do i ask a question at Expedia?*AskExpertService
