A New Parallel Algorithm for Sinkhorn Word-Movers Distance and Its Performance on PIUMA and Xeon CPU
Jesmin Jahan Tithi, Fabrizio Petrini

TL;DR
This paper introduces a new sparse parallel algorithm for Sinkhorn WMD that significantly improves performance on PIUMA and Xeon CPUs by transforming dense computations into sparse ones and optimizing for different architectures.
Contribution
The paper presents a novel sparse parallel algorithm for Sinkhorn WMD, transforming dense computations into sparse kernels and optimizing implementation for PIUMA and Xeon architectures.
Findings
Achieved near-peak performance on PIUMA and Xeon platforms.
Transformed dense EMD into sparse kernels using fused SDDMM-SpMM.
Enhanced efficiency of WMD computations in ML/NLP applications.
Abstract
The Word Movers Distance (WMD) measures the semantic dissimilarity between two text documents by computing the cost of optimally moving all words of a source/query document to the most similar words of a target document. Computing WMD between two documents is costly because it requires solving an optimization problem where is the number of unique words in the document. Fortunately, WMD can be framed as an Earth Mover's Distance (EMD) for which the algorithmic complexity can be reduced to by adding an entropy penalty to the optimization problem and solving it using the Sinkhorn-Knopp algorithm. Additionally, the computation can be made highly parallel by adopting a batching approach, i.e., computing the WMD of a single query document against multiple target documents at once. Sinkhorn WMD is a key kernel used in many ML/NLP applications. and usually gets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Algorithms and Data Compression · Natural Language Processing Techniques
