Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Peter Wittek; Shi Chao Gao; Ik Soo Lim; Li Zhao

arXiv:1305.1422·cs.DC·June 12, 2017

Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Peter Wittek, Shi Chao Gao, Ik Soo Lim, Li Zhao

PDF

4 Repos

TL;DR

Somoclu is a high-performance, parallel library for training self-organizing maps on large and high-dimensional datasets, utilizing multicore, cluster, and GPU computing with interfaces for popular data analysis languages.

Contribution

It introduces a versatile, efficient, and scalable implementation of self-organizing maps that supports multicore, distributed, and GPU computing, with interfaces for Python, R, and MATLAB.

Findings

01

Fast execution on large datasets

02

Memory-efficient training of large maps

03

Supports sparse high-dimensional data

Abstract

Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.