The Importance of the Current Input in Sequence Modeling
Christian Oliva, Luis F. Lago-Fern\'andez

TL;DR
This paper demonstrates that adding a direct input-to-output connection in sequence models, especially LSTMs, consistently improves prediction accuracy and achieves state-of-the-art results in language modeling tasks.
Contribution
Introducing a simple direct connection between input and output in recurrent networks that enhances performance across various sequence modeling problems.
Findings
Consistent improvement in prediction accuracy with the direct connection.
Achieved new state-of-the-art perplexity in language modeling.
Model performance is robust across different architectures and training setups.
Abstract
The last advances in sequence modeling are mainly based on deep learning approaches. The current state of the art involves the use of variations of the standard LSTM architecture, combined with several tricks that improve the final prediction rates of the trained neural networks. However, in some cases, these adaptations might be too much tuned to the particular problems being addressed. In this article, we show that a very simple idea, to add a direct connection between the input and the output, skipping the recurrent module, leads to an increase of the prediction accuracy in sequence modeling problems related to natural language processing. Experiments carried out on different problems show that the addition of this kind of connection to a recurrent network always improves the results, regardless of the architecture and training-specific details. When this idea is introduced into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
