An alternative formulation of attention pooling function in translation

Eddie Conti

arXiv:2409.00068·cs.CL·September 4, 2024

An alternative formulation of attention pooling function in translation

Eddie Conti

PDF

Open Access

TL;DR

This paper proposes a new formulation of the attention pooling function in translation models by projecting attention scores onto a band matrix space, improving approximation and understanding of language structure.

Contribution

It introduces an alternative attention scoring function based on band matrix projections, addressing limitations of traditional attention mechanisms in translation.

Findings

01

The new attention formula closely approximates the original scores.

02

Parameter analysis reveals insights into language processing.

03

The approach guarantees a well-posed, unique solution for attention scores.

Abstract

The aim of this paper is to present an alternative formulation of the attention scoring function in translation tasks. Generally speaking, language is deeply structured, and this is reflected in the attention scoring matrix. We exploit this property to define the attention pooling function, taking this aspect into account. In the first chapters, we introduce the attention mechanism in mathematical terms and explain its limitations and alternative formulations. Next, we focus on the experimental session that led to the alternative formulation. Essentially, we guide queries and keys to interact in a specific manner, encoding the distinct roles of attention heads and directing values on where to seek context. In mathematical terms, we can think of this formula as projecting the attention scores matrix, say $H$ , onto the space of band matrices with fixed bandwidth. This convex subspace is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Computing and Networks · Robotics and Automated Systems