Activator: GLU Activation Function as the Core Component of a Vision Transformer

Abdullah Nazhat Abdullah; Tarkan Aydin

arXiv:2405.15953·cs.CV·November 27, 2025

Activator: GLU Activation Function as the Core Component of a Vision Transformer

Abdullah Nazhat Abdullah, Tarkan Aydin

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper proposes replacing the traditional attention mechanism in vision transformers with a GLU-based architecture to reduce computational costs while maintaining competitive performance.

Contribution

It introduces a novel transformer architecture using GLU activation functions, offering a more efficient alternative to standard attention-based models in vision tasks.

Findings

01

GLU-based architecture reduces computational complexity.

02

Competitive performance achieved compared to baseline models.

03

Supports more efficient vision transformer designs.

Abstract

The transformer architecture has driven many successes in a variety of tasks within the field of deep learning, in particular the recent advances in natural language processing (NLP) culminating with large language models (LLM). Adding to that success, transformer architecture has found widespread interest from computer vision (CV) researchers and practitioners, allowing for many advancements in vision-related tasks and opening the door for multitask and multi-modal deep learning architectures that share the same principle of operation. One drawback to these architectures is their reliance on the scaled dot product attention mechanism with the softmax activation function, which is computationally expensive and requires large compute capabilities for both training and inference. This paper investigates substituting the MLP and attention mechanism usually adopted for transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Abdullah-88/Activator
pytorchOfficial

Models

🤗
Abdullah-Nazhat/Activator
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors

MethodsSoftmax