Informative Initialization and Kernel Selection Improves t-SNE for   Biological Sequences

Prakash Chourasia; Sarwan Ali; Murray Patterson

arXiv:2211.09263·cs.LG·November 18, 2022

Informative Initialization and Kernel Selection Improves t-SNE for Biological Sequences

Prakash Chourasia, Sarwan Ali, Murray Patterson

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that using informed initialization and alternative kernel choices significantly enhances t-SNE's performance and convergence speed when visualizing biological sequence data.

Contribution

The study introduces the use of informed initialization and kernel selection to improve t-SNE's effectiveness for biological sequences.

Findings

01

Improved t-SNE visualizations with better cluster separation.

02

Faster convergence of t-SNE with informed initialization.

03

Enhanced accuracy in biological sequence data representation.

Abstract

The t-distributed stochastic neighbor embedding (t- SNE) is a method for interpreting high dimensional (HD) data by mapping each point to a low dimensional (LD) space (usually two-dimensional). It seeks to retain the structure of the data. An important component of the t-SNE algorithm is the initialization procedure, which begins with the random initialization of an LD vector. Points in this initial vector are then updated to minimize the loss function (the KL divergence) iteratively using gradient descent. This leads comparable points to attract one another while pushing dissimilar points apart. We believe that, by default, these algorithms should employ some form of informative initialization. Another essential component of the t-SNE is using a kernel matrix, a similarity matrix comprising the pairwise distances among the sequences. For t-SNE-based visualization, the Gaussian kernel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pchourasia1/tsne_informed_initialization
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Face and Expression Recognition · Bioinformatics and Genomic Networks