Masked Gated Linear Unit

Yukito Tajima; Nakamasa Inoue; Yusuke Sekikawa; Ikuro Sato; Rio Yokota

arXiv:2506.23225·cs.LG·July 1, 2025

Masked Gated Linear Unit

Yukito Tajima, Nakamasa Inoue, Yusuke Sekikawa, Ikuro Sato, Rio Yokota

PDF

Open Access 1 Video

TL;DR

This paper introduces Masked Gated Linear Units (MGLUs), an efficient and hardware-friendly variant of GLUs that reduces memory usage and increases inference speed in large language models, while maintaining or improving accuracy.

Contribution

The paper proposes MGLUs with MoEG architecture and FlashMGLU kernel, significantly improving memory efficiency and inference speed in LLMs compared to standard GLUs.

Findings

01

Up to 19.7× inference speed-up with FlashMGLU

02

47% more memory-efficient than standard GLUs

03

SwiMGLU matches or surpasses SwiGLU accuracy

Abstract

Gated Linear Units (GLUs) have become essential components in the feed-forward networks of state-of-the-art Large Language Models (LLMs). However, they require twice as many memory reads compared to feed-forward layers without gating, due to the use of separate weight matrices for the gate and value streams. To address this bottleneck, we introduce Masked Gated Linear Units (MGLUs), a novel family of GLUs with an efficient kernel implementation. The core contribution of MGLUs include: (1) the Mixture of Element-wise Gating (MoEG) architecture that learns multiple binary masks, each determining gate or value assignments at the element level on a single shared weight matrix resulting in reduced memory transfer, and (2) FlashMGLU, a hardware-friendly kernel that yields up to a 19.7 $\times$ inference-time speed-up over a naive PyTorch MGLU and is 47% more memory-efficient and 34% faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Masked Gated Linear Unit· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Machine Learning in Materials Science