Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with   Coherent Embeddings

Ga\"elle Candel; David Naccache

arXiv:2109.10538·cs.LG·September 23, 2021

Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

Ga\"elle Candel, David Naccache

PDF

Open Access

TL;DR

This paper introduces a method to reuse and adapt existing t-SNE embeddings to track the evolution of high-dimensional datasets over time, preserving cluster positions and enabling dynamic analysis.

Contribution

The paper proposes a novel approach to reuse t-SNE embeddings for dynamic datasets, maintaining cluster positions and reducing computational complexity compared to re-embedding from scratch.

Findings

01

Effective tracking of cluster evolution in real-world datasets

02

Lower computational complexity for embedding slices of data

03

Facilitates monitoring of dataset dynamics over time

Abstract

$t$ -SNE is an embedding method that the data science community has widely Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. $t$ -SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric, therefore two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. An approach would be to learn a parametric model over an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Data Stream Mining Techniques · Anomaly Detection Techniques and Applications