SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors
Mariam Rakka, Jinhao Li, Guohao Dai, Ahmed Eltawil, Mohammed E. Fouda,, and Fadi Kurdahi

TL;DR
SoftmAP is a novel software-hardware co-design approach that enables efficient, integer-only Softmax computation on associative processors, significantly reducing energy and delay for deploying large language models on resource-limited devices.
Contribution
It introduces a new integer-only Softmax implementation optimized for in-memory compute hardware, addressing non-linear operator bottlenecks in LLM deployment.
Findings
Achieves up to 1000x improvement in energy-delay product over GPUs.
Enables resource-efficient deployment of LLMs without performance loss.
Demonstrates practical viability of integer-only Softmax on associative processors.
Abstract
Recent research efforts focus on reducing the computational and memory overheads of Large Language Models (LLMs) to make them feasible on resource-constrained devices. Despite advancements in compression techniques, non-linear operators like Softmax and Layernorm remain bottlenecks due to their sensitivity to quantization. We propose SoftmAP, a software-hardware co-design methodology that implements an integer-only low-precision Softmax using In-Memory Compute (IMC) hardware. Our method achieves up to three orders of magnitude improvement in the energy-delay product compared to A100 and RTX3090 GPUs, making LLMs more deployable without compromising performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems
MethodsSoftmax · Focus
