Neighbor Embedding for High-Dimensional Sparse Poisson Data
Noga Mudrik, Adam S. Charles

TL;DR
The paper introduces p-SNE, a nonlinear embedding method tailored for high-dimensional, sparse Poisson count data, effectively capturing meaningful structures in various real-world datasets.
Contribution
It develops a novel neighbor embedding technique based on Poisson-specific dissimilarity measures, addressing limitations of existing methods for count data.
Findings
p-SNE accurately recovers structure in synthetic Poisson data.
It reveals meaningful patterns in email, research, and neural data.
Outperforms traditional methods on sparse count datasets.
Abstract
Across many scientific fields, measurements often represent the number of times an event occurs. For example, a document can be represented by word occurrence counts, neural activity by spike counts per time window, or online communication by daily email counts. These measurements yield high-dimensional count data that often approximate a Poisson distribution, frequently with low rates that produce substantial sparsity and complicate downstream analysis. A useful approach is to embed the data into a low-dimensional space that preserves meaningful structure, commonly termed dimensionality reduction. Yet existing dimensionality reduction methods, including both linear (e.g., PCA) and nonlinear approaches (e.g., t-SNE), often assume continuous Euclidean geometry, thereby misaligning with the discrete, sparse nature of low-rate count data. Here, we propose p-SNE (Poisson Stochastic Neighbor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
