CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling

Jun Zhang; Shuyang Jiang; Jiangtao Feng; Lin Zheng; Lingpeng Kong

arXiv:2210.07661·cs.LG·January 14, 2025

CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling

Jun Zhang, Shuyang Jiang, Jiangtao Feng, Lin Zheng, Lingpeng Kong

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the Comprehensive Attention Benchmark (CAB), a detailed evaluation framework for various efficient attention mechanisms across multiple tasks and architectures, addressing gaps in existing benchmarks.

Contribution

It proposes a fine-grained taxonomy of attention patterns, collects diverse real-world tasks, and benchmarks nine attention architectures, providing insights into their performance and fundamental challenges.

Findings

01

Efficient attention methods vary in performance across different attention patterns.

02

Some architectures show consistent benefits in long-context language modeling.

03

Fundamental issues like efficiency length and generalization are identified.

Abstract

Transformer has achieved remarkable success in language, image, and speech processing. Recently, various efficient attention architectures have been proposed to improve transformer's efficiency while largely preserving its efficacy, especially in modeling long sequences. A widely-used benchmark to test these efficient methods' capability on long-range modeling is Long Range Arena (LRA). However, LRA only focuses on the standard bidirectional (or noncausal) self attention, and completely ignores cross attentions and unidirectional (or causal) attentions, which are equally important to downstream applications. In this paper, we propose Comprehensive Attention Benchmark (CAB) under a fine-grained attention taxonomy with four distinguishable attention patterns, namely, noncausal self, causal self, noncausal cross, and causal cross attentions. CAB collects seven real-world tasks from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shark-nlp/cab
pytorchOfficial

Videos

CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling· slideslive

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Speech Recognition and Synthesis

MethodsTest