Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
Wang Ling, Tiago Lu\'is, Lu\'is Marujo, Ram\'on Fernandez, Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel, Trancoso

TL;DR
This paper presents a character-based compositional model for word representation that uses bidirectional LSTMs to generate vectors, achieving state-of-the-art results especially in morphologically rich languages.
Contribution
It introduces a novel character-level compositional approach for word embedding that reduces parameters and improves performance over traditional word-based models.
Findings
Achieves state-of-the-art results in language modeling and POS tagging.
Significantly benefits morphologically rich languages like Turkish.
Requires only a single vector per character type, reducing model complexity.
Abstract
We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form-function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
