RRWKV: Capturing Long-range Dependencies in RWKV

Leilei Wang

arXiv:2306.05176·cs.CL·September 16, 2024·2 cites

RRWKV: Capturing Long-range Dependencies in RWKV

Leilei Wang

PDF

Open Access

TL;DR

This paper introduces RRWKV, an architecture that enhances RWKV by incorporating retrospective capabilities to better capture long-range dependencies in NLP tasks while maintaining efficiency.

Contribution

The paper proposes RRWKV, a novel extension of RWKV that effectively captures long-range dependencies without increasing computational complexity.

Findings

01

RRWKV outperforms RWKV in long-range dependency tasks.

02

RRWKV maintains linear complexity similar to RWKV.

03

RRWKV achieves comparable or better accuracy than transformer models.

Abstract

Owing to the impressive dot-product attention, the Transformers have been the dominant architectures in various natural language processing (NLP) tasks. Recently, the Receptance Weighted Key Value (RWKV) architecture follows a non-transformer architecture to eliminate the drawbacks of dot-product attention, where memory and computational complexity exhibits quadratic scaling with sequence length. Although RWKV has exploited a linearly tensor-product attention mechanism and achieved parallelized computations by deploying the time-sequential mode, it fails to capture long-range dependencies because of its limitation on looking back at previous information, compared with full information obtained by direct interactions in the standard transformer. Therefore, the paper devises the Retrospected Receptance Weighted Key Value (RRWKV) architecture via incorporating the retrospecting ability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational Physics and Python Applications · Advanced Graph Neural Networks