Counting in Language with RNNs

Heng xin Fun; Sergiy V Bokhnyak; Francesco Saverio Zuppichini

arXiv:1810.12411·cs.LG·November 1, 2018

Counting in Language with RNNs

Heng xin Fun, Sergiy V Bokhnyak, Francesco Saverio Zuppichini

PDF

Open Access

TL;DR

This paper investigates why LSTM models outperform GRUs in language tasks, attributing the difference to their ability to perform counting based on cell states, especially in simplified language models.

Contribution

It provides a theoretical analysis demonstrating how LSTMs perform counting through cell states, explaining their superior performance over GRUs in language modeling.

Findings

01

LSTMs can perform counting in simplified language models

02

GRUs are less capable of counting due to their structure

03

Counting ability correlates with language modeling performance

Abstract

In this paper we examine a possible reason for the LSTM outperforming the GRU on language modeling and more specifically machine translation. We hypothesize that this has to do with counting. This is a consistent theme across the literature of long term dependence, counting, and language modeling for RNNs. Using the simplified forms of language -- Context-Free and Context-Sensitive Languages -- we show how exactly the LSTM performs its counting based on their cell states during inference and why the GRU cannot perform as well.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Algorithms · Natural Language Processing Techniques

MethodsSigmoid Activation · Tanh Activation · Gated Recurrent Unit · Long Short-Term Memory