# Moderate Adaptive Linear Units (MoLU)

**Authors:** Hankyul Koh, Joon-hyuk Ko, Wonho Jhe

arXiv: 2302.13696 · 2025-07-16

## TL;DR

MoLU is a new smooth activation function for deep neural networks that improves convergence speed and accuracy, suitable for various architectures including LLMs, Neural ODEs, PINNs, and CNNs.

## Contribution

Introduces MoLU, a mathematically elegant, infinitely differentiable activation function that enhances training efficiency and performance in diverse neural network models.

## Key findings

- MoLU outperforms GeLU, SiLU, and Mish in convergence speed.
- MoLU achieves higher final accuracy across tested architectures.
- MoLU's smoothness mitigates vanishing and exploding gradient issues.

## Abstract

We propose the Moderate Adaptive Linear Unit (MoLU), a novel activation function for deep neural networks, defined analytically as: f(x)=x \times (1+tanh(x))/2. MoLU combines mathematical elegance with empirical effectiveness, exhibiting superior performance in terms of prediction accuracy, convergence speed, and computational efficiency. Due to its C-infinity smoothness, i.e. infinite differentiability and analyticity, MoLU is expected to mitigate issues such as vanishing or exploding gradients, making it suitable for a broad range of architectures and applications, including large language models (LLMs), Neural Ordinary Differential Equations (Neural ODEs), Physics-Informed Neural Networks (PINNs), and Convolutional Neural Networks (CNNs). Empirical evaluations show that MoLU consistently achieves faster convergence and improved final accuracy relative to widely used activation functions such as GeLU, SiLU, and Mish. These properties position MoLU as a promising and robust candidate for general-purpose activation across diverse deep learning paradigms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.13696/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/2302.13696/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/2302.13696/full.md

---
Source: https://tomesphere.com/paper/2302.13696