The Importance of the Current Input in Sequence Modeling

Christian Oliva; Luis F. Lago-Fern\'andez

arXiv:2112.11776·cs.CL·December 23, 2021

The Importance of the Current Input in Sequence Modeling

Christian Oliva, Luis F. Lago-Fern\'andez

PDF

Open Access

TL;DR

This paper demonstrates that adding a direct input-to-output connection in sequence models, especially LSTMs, consistently improves prediction accuracy and achieves state-of-the-art results in language modeling tasks.

Contribution

Introducing a simple direct connection between input and output in recurrent networks that enhances performance across various sequence modeling problems.

Findings

01

Consistent improvement in prediction accuracy with the direct connection.

02

Achieved new state-of-the-art perplexity in language modeling.

03

Model performance is robust across different architectures and training setups.

Abstract

The last advances in sequence modeling are mainly based on deep learning approaches. The current state of the art involves the use of variations of the standard LSTM architecture, combined with several tricks that improve the final prediction rates of the trained neural networks. However, in some cases, these adaptations might be too much tuned to the particular problems being addressed. In this article, we show that a very simple idea, to add a direct connection between the input and the output, skipping the recurrent module, leads to an increase of the prediction accuracy in sequence modeling problems related to natural language processing. Experiments carried out on different problems show that the addition of this kind of connection to a recurrent network always improves the results, regardless of the architecture and training-specific details. When this idea is introduced into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory