SAS: Simulated Attention Score
Chuanyang Zheng, Jiankai Sun, Yihang Gao, Yuehao Wang, Peihao Wang, Jing Xiong, Liliang Ren, Hao Cheng, Janardhan Kulkarni, Yelong Shen, Atlas Wang, Mac Schwager, Anderson Schneider, Xiaodong Liu, Jianfeng Gao

TL;DR
This paper introduces Simulated Attention Score (SAS), a method that enhances attention mechanisms in Transformers by simulating larger attention capacities without increasing model size, leading to improved performance.
Contribution
The paper proposes SAS, a novel approach to simulate larger attention heads and feature dimensions efficiently, along with PEAA for parameter control, improving Transformer attention performance.
Findings
SAS improves attention performance across multiple datasets.
Simulating larger attention heads enhances model expressiveness.
Parameter-efficient design maintains model size while boosting accuracy.
Abstract
The attention mechanism is a core component of the Transformer architecture. Various methods have been developed to compute attention scores, including multi-head attention (MHA), multi-query attention, group-query attention and so on. We further analyze the MHA and observe that its performance improves as the number of attention heads increases, provided the hidden size per head remains sufficiently large. Therefore, increasing both the head count and hidden size per head with minimal parameter overhead can lead to significant performance gains at a low cost. Motivated by this insight, we introduce Simulated Attention Score (SAS), which maintains a compact model size while simulating a larger number of attention heads and hidden feature dimension per head. This is achieved by projecting a low-dimensional head representation into a higher-dimensional space, effectively increasing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBig Data and Digital Economy · Information Retrieval and Search Behavior · Personal Information Management and User Behavior
