Modeling Order in Neural Word Embeddings at Scale
Andrew Trask, David Gilmore, Matthew Russell

TL;DR
This paper introduces a neural language model that captures both word and character order, producing meaningful embeddings that outperform previous models on analogy tasks and can be trained efficiently at scale.
Contribution
It presents a novel neural model integrating word and character order, enabling large-scale training and improved semantic and syntactic representations.
Findings
Achieved 85.8% on a word-analogy task, surpassing previous scores.
Enabled training of a 160 billion parameter model overnight on limited hardware.
Produced embeddings with meaningful substructure for NLP applications.
Abstract
Natural Language Processing (NLP) systems commonly leverage bag-of-words co-occurrence techniques to capture semantic and syntactic word relationships. The resulting word-level distributed representations often ignore morphological information, though character-level embeddings have proven valuable to NLP tasks. We propose a new neural language model incorporating both word order and character order in its embedding. The model produces several vector spaces with meaningful substructure, as evidenced by its performance of 85.8% on a recent word-analogy task, exceeding best published syntactic word-analogy scores by a 58% error margin. Furthermore, the model includes several parallel training methods, most notably allowing a skip-gram network with 160 billion parameters to be trained overnight on 3 multi-core CPUs, 14x larger than the previous largest neural network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
