# ProSper -- A Python Library for Probabilistic Sparse Coding with   Non-Standard Priors and Superpositions

**Authors:** Georgios Exarchakis, J\"org Bornschein, Abdul-Saboor Sheikh, Zhenwen, Dai, Marc Henniges, Jakob Drefs, J\"org L\"ucke

arXiv: 1908.06843 · 2019-08-20

## TL;DR

ProSper is a versatile Python library that extends dictionary learning methods to handle non-linear data, flexible priors, and large-scale problems with scalable, parallel algorithms.

## Contribution

The library introduces novel probabilistic algorithms for dictionary learning that go beyond standard methods like ICA and NMF, supporting non-linear components and flexible priors.

## Key findings

- Supports large-scale dictionary learning with hundreds of CPUs.
- Includes diverse algorithms like BSC, TSC, DSC, MCA, MMCA, GSC.
- Enables inference of prior and noise parameters.

## Abstract

ProSper is a python library containing probabilistic algorithms to learn dictionaries. Given a set of data points, the implemented algorithms seek to learn the elementary components that have generated the data. The library widens the scope of dictionary learning approaches beyond implementations of standard approaches such as ICA, NMF or standard L1 sparse coding. The implemented algorithms are especially well-suited in cases when data consist of components that combine non-linearly and/or for data requiring flexible prior distributions. Furthermore, the implemented algorithms go beyond standard approaches by inferring prior and noise parameters of the data, and they provide rich a-posteriori approximations for inference. The library is designed to be extendable and it currently includes: Binary Sparse Coding (BSC), Ternary Sparse Coding (TSC), Discrete Sparse Coding (DSC), Maximal Causes Analysis (MCA), Maximum Magnitude Causes Analysis (MMCA), and Gaussian Sparse Coding (GSC, a recent spike-and-slab sparse coding approach). The algorithms are scalable due to a combination of variational approximations and parallelization. Implementations of all algorithms allow for parallel execution on multiple CPUs and multiple machines for medium to large-scale applications. Typical large-scale runs of the algorithms can use hundreds of CPUs to learn hundreds of dictionary elements from data with tens of millions of floating-point numbers such that models with several hundred thousand parameters can be optimized. The library is designed to have minimal dependencies and to be easy to use. It targets users of dictionary learning algorithms and Machine Learning researchers.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.06843/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1908.06843/full.md

---
Source: https://tomesphere.com/paper/1908.06843