Swivel: Improving Embeddings by Noticing What's Missing

Noam Shazeer; Ryan Doherty; Colin Evans; Chris Waterson

arXiv:1602.02215·cs.CL·February 9, 2016·56 cites

Swivel: Improving Embeddings by Noticing What's Missing

Noam Shazeer, Ryan Doherty, Colin Evans, Chris Waterson

PDF

Open Access 3 Repos

TL;DR

Swivel is a scalable embedding method that leverages the entire co-occurrence matrix, including unobserved data, to produce more accurate low-dimensional embeddings through approximate factorization and parallel computation.

Contribution

It introduces Swivel, a novel embedding technique that utilizes all co-occurrence information and scalable computation to improve embedding quality over prior methods.

Findings

01

Produces more accurate embeddings than observed-only methods

02

Scales efficiently to large corpora using parallel processing

03

Handles unobserved co-occurrences effectively

Abstract

We present Submatrix-wise Vector Embedding Learner (Swivel), a method for generating low-dimensional feature embeddings from a feature co-occurrence matrix. Swivel performs approximate factorization of the point-wise mutual information matrix via stochastic gradient descent. It uses a piecewise loss with special handling for unobserved co-occurrences, and thus makes use of all the information in the matrix. While this requires computation proportional to the size of the entire matrix, we make use of vectorized multiplication to process thousands of rows and columns at once to compute millions of predicted values. Furthermore, we partition the matrix into shards in order to parallelize the computation across many nodes. This approach results in more accurate embeddings than can be achieved with methods that consider only observed co-occurrences, and can scale to much larger corpora than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning · Face and Expression Recognition