Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on Multi-Core CPUs
Narendra Chaudhary, Alexander Pivovar, Pavel Yakovlev, Andrey, Gorshkov, Sanchit Misra

TL;DR
This paper presents Acc-t-SNE, a highly optimized CPU implementation of Barnes-Hut t-SNE that significantly accelerates high-dimensional data visualization tasks through advanced parallelization and cache optimization techniques.
Contribution
The paper introduces a new CPU-based Barnes-Hut t-SNE implementation with novel optimizations, achieving substantial speedups over existing methods.
Findings
Up to 261x faster than scikit-learn
Up to 4x faster than daal4py implementation
Effective parallelization on 32-core CPUs
Abstract
t-SNE remains one of the most popular embedding techniques for visualizing high-dimensional data. Most standard packages of t-SNE, such as scikit-learn, use the Barnes-Hut t-SNE (BH t-SNE) algorithm for large datasets. However, existing CPU implementations of this algorithm are inefficient. In this work, we accelerate the BH t-SNE on CPUs via cache optimizations, SIMD, parallelizing sequential steps, and improving parallelization of multithreaded steps. Our implementation (Acc-t-SNE) is up to 261x and 4x faster than scikit-learn and the state-of-the-art BH t-SNE implementation from daal4py, respectively, on a 32-core Intel(R) Icelake cloud instance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsError Correcting Code Techniques · Advanced Data Storage Technologies · Data Management and Algorithms
