RRWKV: Capturing Long-range Dependencies in RWKV
Leilei Wang

TL;DR
This paper introduces RRWKV, an architecture that enhances RWKV by incorporating retrospective capabilities to better capture long-range dependencies in NLP tasks while maintaining efficiency.
Contribution
The paper proposes RRWKV, a novel extension of RWKV that effectively captures long-range dependencies without increasing computational complexity.
Findings
RRWKV outperforms RWKV in long-range dependency tasks.
RRWKV maintains linear complexity similar to RWKV.
RRWKV achieves comparable or better accuracy than transformer models.
Abstract
Owing to the impressive dot-product attention, the Transformers have been the dominant architectures in various natural language processing (NLP) tasks. Recently, the Receptance Weighted Key Value (RWKV) architecture follows a non-transformer architecture to eliminate the drawbacks of dot-product attention, where memory and computational complexity exhibits quadratic scaling with sequence length. Although RWKV has exploited a linearly tensor-product attention mechanism and achieved parallelized computations by deploying the time-sequential mode, it fails to capture long-range dependencies because of its limitation on looking back at previous information, compared with full information obtained by direct interactions in the standard transformer. Therefore, the paper devises the Retrospected Receptance Weighted Key Value (RRWKV) architecture via incorporating the retrospecting ability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational Physics and Python Applications · Advanced Graph Neural Networks
