TL;DR
Somoclu is a high-performance, parallel library for training self-organizing maps on large and high-dimensional datasets, utilizing multicore, cluster, and GPU computing with interfaces for popular data analysis languages.
Contribution
It introduces a versatile, efficient, and scalable implementation of self-organizing maps that supports multicore, distributed, and GPU computing, with interfaces for Python, R, and MATLAB.
Findings
Fast execution on large datasets
Memory-efficient training of large maps
Supports sparse high-dimensional data
Abstract
Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
