# Machine Learning on Sequential Data Using a Recurrent Weighted Average

**Authors:** Jared Ostmeyer, Lindsay Cowell

arXiv: 1703.01253 · 2019-02-18

## TL;DR

This paper introduces a novel RNN model called Recurrent Weighted Average (RWA) that incorporates information from all past steps, outperforming standard LSTM models on various sequence tasks.

## Contribution

The paper proposes the RWA model, which computes a running average over all past steps, effectively integrating an attention mechanism into RNNs with minimal computational overhead.

## Key findings

- RWA outperforms LSTM on sequence tasks
- Effective handling of long-range dependencies
- Maintains computational efficiency

## Abstract

Recurrent Neural Networks (RNN) are a type of statistical model designed to handle sequential data. The model reads a sequence one symbol at a time. Each symbol is processed based on information collected from the previous symbols. With existing RNN architectures, each symbol is processed using only information from the previous processing step. To overcome this limitation, we propose a new kind of RNN model that computes a recurrent weighted average (RWA) over every past processing step. Because the RWA can be computed as a running average, the computational overhead scales like that of any other RNN architecture. The approach essentially reformulates the attention mechanism into a stand-alone model. The performance of the RWA model is assessed on the variable copy problem, the adding problem, classification of artificial grammar, classification of sequences by length, and classification of the MNIST images (where the pixels are read sequentially one at a time). On almost every task, the RWA model is found to outperform a standard LSTM model.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.01253/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1703.01253/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1703.01253/full.md

---
Source: https://tomesphere.com/paper/1703.01253