RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals
Jaemu Heo, Eldor Fozilov, Hyunmin Song, Taehwan Kim

TL;DR
RingFormer introduces a recurrent, ring-like Transformer architecture with adaptive level signals, significantly reducing parameters while maintaining high performance in sequence and image tasks.
Contribution
The paper proposes RingFormer, a novel recurrent Transformer model that employs adaptive level signals and parameter sharing to reduce complexity without sacrificing accuracy.
Findings
Achieves comparable performance to standard Transformers in translation and image classification.
Reduces model parameters significantly compared to traditional Transformer architectures.
Demonstrates effectiveness of circular recurrence with adaptive signals in sequence modeling.
Abstract
Transformers have achieved great success in effectively processing sequential data such as text. Their architecture consisting of several attention and feedforward blocks can model relations between elements of a sequence in parallel manner, which makes them very efficient to train and effective in sequence modeling. Even though they have shown strong performance in processing sequential data, the size of their parameters is considerably larger when compared to other architectures such as RNN and CNN based models. Therefore, several approaches have explored parameter sharing and recurrence in Transformer models to address their computational demands. However, such methods struggle to maintain high performance compared to the original transformer model. To address this challenge, we propose our novel approach, RingFormer, which employs one Transformer layer that processes input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Reservoir Computing · Neural Networks and Applications
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax
