LASER: Attention with Exponential Transformation

Sai Surya Duvvuri; Inderjit S. Dhillon

arXiv:2411.03493·cs.LG·July 15, 2025·2 cites

LASER: Attention with Exponential Transformation

Sai Surya Duvvuri, Inderjit S. Dhillon

PDF

Open Access

TL;DR

This paper introduces LASER, a new attention mechanism that enhances gradient flow in transformers, leading to improved performance across language, vision, and speech tasks, with minimal implementation changes.

Contribution

LASER provides a theoretically justified modification to attention that increases gradient signals, improving learning efficiency and performance in large-scale models.

Findings

01

Up to 1.44% improvement on downstream tasks

02

Enhanced generalization across vision, speech, and text

03

Achieved better finetuning results with minimal changes

Abstract

Transformers have had tremendous impact for several sequence related tasks, largely due to their ability to retrieve from any part of the sequence via softmax based dot-product attention. This mechanism plays a crucial role in Transformer's performance. We analyze the gradients backpropagated through the softmax operation in the attention mechanism and observe that these gradients can often be small. This poor gradient signal backpropagation can lead to inefficient learning of parameters preceeding the attention operations. To this end, we introduce a new attention mechanism called LASER, which we analytically show to admit a larger gradient signal. We show that LASER attention can be implemented by making small modifications to existing attention implementations. We conduct experiments on autoregressive large language models (LLMs) with upto 7.7 billion parameters with an average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Dropout · Absolute Position Encodings · Label Smoothing · Transformer · Dense Connections