RNNs are not Transformers (Yet): The Key Bottleneck on In-context   Retrieval

Kaiyue Wen; Xingyu Dang; Kaifeng Lyu

arXiv:2402.18510·cs.LG·December 10, 2024·1 cites

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the limitations of RNNs compared to Transformers in solving algorithmic problems, showing that RNNs lack the retrieval capacity needed for certain tasks but can be enhanced to match Transformers' capabilities.

Contribution

The paper provides a theoretical analysis of RNNs' limitations and demonstrates how retrieval techniques and minimal modifications can bridge the gap with Transformers.

Findings

01

RNNs cannot solve tasks requiring perfect context retrieval.

02

Transformers can solve associative recall and graph problems easily.

03

Enhancing RNNs with retrieval methods closes the performance gap.

Abstract

This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on understanding whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. Our theoretical analysis reveals that CoT improves RNNs but is insufficient to close the gap with Transformers. A key bottleneck lies in the inability of RNNs to perfectly retrieve information from the context, even with CoT: for several tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, we prove that RNNs are not expressive enough to solve the tasks while Transformers can solve them with ease. Conversely, we prove that adopting techniques to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dangxingyu/rnn-icrag
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsAttention Is All You Need · Linear Layer · Dropout · Layer Normalization · Byte Pair Encoding · Multi-Head Attention · Dense Connections · Label Smoothing · Adam · Focus