TL;DR
TextEssence is an interactive web-based tool that enables comparative analysis of semantic shifts between corpora using word embeddings, featuring visualization modes and a new measure of embedding confidence, demonstrated through a COVID-19 literature case study.
Contribution
Introduces TextEssence, a novel interactive system for analyzing semantic differences between corpora with visualization and a new embedding confidence measure.
Findings
Effective visualization of semantic shifts demonstrated
New measure of embedding confidence aids analysis
Case study on COVID-19 literature validates utility
Abstract
Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence is available from https://github.com/drgriffis/text-essence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
