DAPA: Distribution Aware Piecewise Activation Functions for On-Device Transformer Inference and Training

Maoyang Xiang; Bo Wang

arXiv:2603.19338·cs.LG·March 23, 2026

DAPA: Distribution Aware Piecewise Activation Functions for On-Device Transformer Inference and Training

Maoyang Xiang, Bo Wang

PDF

Open Access

TL;DR

This paper introduces DAPA, a novel distribution-aware piecewise activation function designed to enhance on-device Transformer inference and training by reducing resource consumption and increasing speed.

Contribution

DAPA is a differentiable, hardware-friendly activation that adapts to data distribution, offering significant speedups and resource savings for Transformer models.

Findings

01

DAPA speeds up GELU computation by 16x.

02

DAPA reduces DSP utilization by 16x.

03

Maintains or improves model performance.

Abstract

Non-linear activation functions play a pivotal role in on-device inference and training, as they not only consume substantial hardware resources but also impose a significant impact on system performance and energy efficiency. In this work, we propose Distribution-Aware Piecewise Activation (DAPA), a differentiable and hardware-friendly activation function for Transformer architectures by exploiting the distribution of pre-activation data. DAPA employs a non-uniform piecewise approximation that allocates finer segments to high-probability regions of the distribution, improving generalizability over prior piecewise linear methods. The resulting approximation is further quantized using Distribution-Weighted Mean Square Error to reduce latency and resource utilization for hardware deployment. Our HLS implementation demonstrates that DAPA speeds up GELU computation by 16 $\times$ and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · CCD and CMOS Imaging Sensors