TL;DR
This paper introduces a sparse predictive autoencoder that learns long-range dependencies in sequence tasks with reduced memory, outperforming traditional RNNs and LSTMs on sequence modeling and language prediction tasks.
Contribution
The paper presents a novel recurrent autoencoder architecture with sparse activations and boosting, enabling high performance sequence learning with lower memory requirements.
Findings
Outperforms LSTM on high-Markov-order sequence tasks
Achieves significant improvement in language modeling perplexity
Learns sequences faster and more completely than traditional RNNs
Abstract
In sequence learning tasks such as language modelling, Recurrent Neural Networks must learn relationships between input features separated by time. State of the art models such as LSTM and Transformer are trained by backpropagation of losses into prior hidden states and inputs held in memory. This allows gradients to flow from present to past and effectively learn with perfect hindsight, but at a significant memory cost. In this paper we show that it is possible to train high performance recurrent networks using information that is local in time, and thereby achieve a significantly reduced memory footprint. We describe a predictive autoencoder called bRSM featuring recurrent connections, sparse activations, and a boosting rule for improved cell utilization. The architecture demonstrates near optimal performance on a non-deterministic (stochastic) partially-observable sequence learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Tanh Activation · Solana Customer Service Number +1-833-534-1729 · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing
