$\lambda$-GELU: Learning Gating Hardness for Controlled ReLU-ization in Deep Networks

Cristian P\'erez-Corral; Alberto Fern\'andez-Hern\'andez; Jose I. Mestre; Manuel F. Dolz; Enrique S. Quintana-Ort\'i

arXiv:2603.21991·cs.LG·April 6, 2026

$\lambda$-GELU: Learning Gating Hardness for Controlled ReLU-ization in Deep Networks

Cristian P\'erez-Corral, Alberto Fern\'andez-Hern\'andez, Jose I. Mestre, Manuel F. Dolz, Enrique S. Quintana-Ort\'i

PDF

TL;DR

This paper introduces a parameterized GELU variant with a controllable hardness parameter, enabling a smooth transition towards ReLU-like models for better compatibility with existing tools.

Contribution

It proposes a novel hardness-parameterized GELU formulation with a stable learning scheme, allowing controlled ReLU-ization in deep networks.

Findings

01

Layerwise hardness profiles are structured and robust across models.

02

Progressive hardening of gates enables ReLU substitution with minimal disruption.

03

The hardness parameter effectively bridges smooth GELU training and ReLU-compatible models.

Abstract

Gaussian Error Linear Unit (GELU) is a widely used smooth alternative to Rectifier Linear Unit (ReLU), yet many deployment, compression, and analysis toolchains are most naturally expressed for piecewise-linear (ReLU-type) networks. We study a hardness-parameterized formulation of GELU, f(x;{\lambda})=x{\Phi}({\lambda} x), where {\Phi} is the Gaussian CDF and {\lambda} \in [1, infty) controls gate sharpness, with the goal of turning smooth gated training into a controlled path toward ReLU-compatible models. Learning {\lambda} is non-trivial: naive updates yield unstable dynamics and effective gradient attenuation, so we introduce a constrained reparameterization and an optimizer-aware update scheme. Empirically, across a diverse set of model--dataset pairs spanning MLPs, CNNs, and Transformers, we observe structured layerwise hardness profiles and assess their robustness under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.