Character-Aware Neural Language Models
Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush

TL;DR
This paper introduces a character-aware neural language model that uses CNNs and highway networks over characters, achieving comparable or better performance than word-level models with fewer parameters across multiple languages.
Contribution
The paper presents a novel character-level neural language model that effectively captures semantic and orthographic information, outperforming traditional word-level models on diverse languages with fewer parameters.
Findings
Achieves state-of-the-art performance on English Penn Treebank
Outperforms word/morpheme-level LSTM baselines on multiple languages
Encodes semantic and orthographic information from characters
Abstract
We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsSigmoid Activation · Highway Layer · Highway Network
