CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jun Zhang, Shuyang Jiang, Jiangtao Feng, Lin Zheng, Lingpeng Kong

TL;DR
This paper introduces the Comprehensive Attention Benchmark (CAB), a detailed evaluation framework for various efficient attention mechanisms across multiple tasks and architectures, addressing gaps in existing benchmarks.
Contribution
It proposes a fine-grained taxonomy of attention patterns, collects diverse real-world tasks, and benchmarks nine attention architectures, providing insights into their performance and fundamental challenges.
Findings
Efficient attention methods vary in performance across different attention patterns.
Some architectures show consistent benefits in long-context language modeling.
Fundamental issues like efficiency length and generalization are identified.
Abstract
Transformer has achieved remarkable success in language, image, and speech processing. Recently, various efficient attention architectures have been proposed to improve transformer's efficiency while largely preserving its efficacy, especially in modeling long sequences. A widely-used benchmark to test these efficient methods' capability on long-range modeling is Long Range Arena (LRA). However, LRA only focuses on the standard bidirectional (or noncausal) self attention, and completely ignores cross attentions and unidirectional (or causal) attentions, which are equally important to downstream applications. In this paper, we propose Comprehensive Attention Benchmark (CAB) under a fine-grained attention taxonomy with four distinguishable attention patterns, namely, noncausal self, causal self, noncausal cross, and causal cross attentions. CAB collects seven real-world tasks from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Advanced Neural Network Applications · Speech Recognition and Synthesis
MethodsTest
