RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals

Jaemu Heo; Eldor Fozilov; Hyunmin Song; Taehwan Kim

arXiv:2502.13181·cs.LG·February 20, 2025

RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals

Jaemu Heo, Eldor Fozilov, Hyunmin Song, Taehwan Kim

PDF

Open Access 1 Video

TL;DR

RingFormer introduces a recurrent, ring-like Transformer architecture with adaptive level signals, significantly reducing parameters while maintaining high performance in sequence and image tasks.

Contribution

The paper proposes RingFormer, a novel recurrent Transformer model that employs adaptive level signals and parameter sharing to reduce complexity without sacrificing accuracy.

Findings

01

Achieves comparable performance to standard Transformers in translation and image classification.

02

Reduces model parameters significantly compared to traditional Transformer architectures.

03

Demonstrates effectiveness of circular recurrence with adaptive signals in sequence modeling.

Abstract

Transformers have achieved great success in effectively processing sequential data such as text. Their architecture consisting of several attention and feedforward blocks can model relations between elements of a sequence in parallel manner, which makes them very efficient to train and effective in sequence modeling. Even though they have shown strong performance in processing sequential data, the size of their parameters is considerably larger when compared to other architectures such as RNN and CNN based models. Therefore, several approaches have explored parameter sharing and recurrence in Transformer models to address their computational demands. However, such methods struggle to maintain high performance compared to the original transformer model. To address this challenge, we propose our novel approach, RingFormer, which employs one Transformer layer that processes input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals· underline

Taxonomy

TopicsNeural Networks and Reservoir Computing · Neural Networks and Applications

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax