Circuit Complexity Bounds for RoPE-based Transformer Architecture

Bo Chen; Xiaoyu Li; Yingyu Liang; Jiangxuan Long; Zhenmei Shi; Zhao; Song

arXiv:2411.07602·cs.LG·December 3, 2024

Circuit Complexity Bounds for RoPE-based Transformer Architecture

Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao, Song

PDF

Open Access 1 Video

TL;DR

This paper establishes a circuit complexity bound for RoPE-based Transformer architectures, revealing fundamental limitations in their expressivity despite empirical success, and providing insights for future research.

Contribution

It provides the first circuit complexity bound for RoPE-based Transformers, showing their limitations in solving certain computational problems under specific complexity class assumptions.

Findings

01

RoPE-based Transformers cannot solve certain problems unless complexity classes collapse.

02

Theoretical limitations contrast with empirical success of RoPE embeddings.

03

Results guide future work on the expressivity of RoPE-based architectures.

Abstract

Characterizing the express power of the Transformer architecture is critical to understanding its capacity limits and scaling law. Recent works provide the circuit complexity bounds to Transformer-like architecture. On the other hand, Rotary Position Embedding ( $RoPE$ ) has emerged as a crucial technique in modern large language models, offering superior performance in capturing positional information compared to traditional position embeddings, which shows great potential in application prospects, particularly for the long context scenario. Empirical evidence also suggests that $RoPE$ -based Transformer architectures demonstrate greater generalization capabilities compared to conventional Transformer models. In this work, we establish a circuit complexity bound for Transformers with $RoPE$ attention. Our key contribution is that we show that unless…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Circuit Complexity Bounds for RoPE-based Transformer Architecture· underline

Taxonomy

TopicsLow-power high-performance VLSI design · Semiconductor materials and devices · Advancements in Semiconductor Devices and Circuit Design

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Residual Connection