Enhancing Transformer RNNs with Multiple Temporal Perspectives
Razvan-Gabriel Dumitru, Darius Peteleaza, Mihai Surdeanu

TL;DR
This paper proposes a novel method of integrating multiple temporal perspectives into RNN architectures, specifically enhancing the RWKV model, to improve sequential data understanding with minimal additional parameters and computational overhead.
Contribution
It introduces a new approach for RNNs that maintains diverse temporal views, improving context interpretation without extensive retraining or large parameter increases.
Findings
Enhanced performance across multiple benchmarks.
Minimal parameter increase (as low as 0.04%).
Maintains linear inference complexity.
Abstract
We introduce the concept of multiple temporal perspectives, a novel approach applicable to Recurrent Neural Network (RNN) architectures for enhancing their understanding of sequential data. This method involves maintaining diverse temporal views of previously encountered text, significantly enriching the language models' capacity to interpret context. To show the efficacy of this approach, we incorporate it into the Receptance Weighted Key Value (RWKV) architecture, addressing its inherent challenge of retaining all historical information within a single hidden state. Notably, this improvement is achieved with a minimal increase in the number of parameters --even as little as of the original number of parameters. Further, the additional parameters necessary for the multiple temporal perspectives are fine-tuned with minimal computational overhead, avoiding the need for a full…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Technology and Control Systems · Neural Networks and Applications
