TL;DR
This paper introduces the minimum-distortion embedding (MDE) framework, a versatile approach for vector embedding that unifies many existing methods and provides scalable algorithms and software for large datasets.
Contribution
The paper formalizes the MDE problem, develops a scalable quasi-Newton algorithm, and implements an open-source Python package for flexible and efficient embeddings.
Findings
MDE encompasses many existing embedding techniques.
The proposed algorithm scales to datasets with millions of items.
PyMDE software enables rapid experimentation with various embeddings.
Abstract
We consider the vector embedding problem. We are given a finite set of items, with the goal of assigning a representative vector to each one, possibly under some constraints (such as the collection of vectors being standardized, i.e., having zero mean and unit covariance). We are given data indicating that some pairs of items are similar, and optionally, some other pairs are dissimilar. For pairs of similar items, we want the corresponding vectors to be near each other, and for dissimilar pairs, we want the corresponding vectors to not be near each other, measured in Euclidean distance. We formalize this by introducing distortion functions, defined for some pairs of the items. Our goal is to choose an embedding that minimizes the total distortion, subject to the constraints. We call this the minimum-distortion embedding (MDE) problem. The MDE framework is simple but general. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
