Going Beyond T-SNE: Exposing \texttt{whatlies} in Text Embeddings

Vincent D. Warmerdam; Thomas Kober; Rachael Tatman

arXiv:2009.02113·cs.CL·September 7, 2020

Going Beyond T-SNE: Exposing \texttt{whatlies} in Text Embeddings

Vincent D. Warmerdam, Thomas Kober, Rachael Tatman

PDF

Open Access 1 Repo

TL;DR

whatlies is an open source toolkit that enables visual inspection and analysis of word and sentence embeddings across multiple backends, combining vector arithmetic with visualization tools for better interpretability.

Contribution

It introduces a unified, extensible API and visualization suite for exploring embeddings, supporting various backends and dimensionality reduction techniques.

Findings

01

Enhanced interpretability of embeddings through visualization

02

Support for multiple embedding backends and techniques

03

Interactive visualizations easily shareable via Jupyter notebooks

Abstract

We introduce whatlies, an open source toolkit for visually inspecting word and sentence embeddings. The project offers a unified and extensible API with current support for a range of popular embedding backends including spaCy, tfhub, huggingface transformers, gensim, fastText and BytePair embeddings. The package combines a domain specific language for vector arithmetic with visualisation tools that make exploring word embeddings more intuitive and concise. It offers support for many popular dimensionality reduction techniques as well as many interactive visualisations that can either be statically exported or shared via Jupyter notebooks. The project documentation is available from https://rasahq.github.io/whatlies/.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RasaHQ/whatlies
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Authorship Attribution and Profiling · Hate Speech and Cyberbullying Detection

MethodsfastText