Generic Embedding-Based Lexicons for Transparent and Reproducible Text   Scoring

Catherine Moez

arXiv:2411.00964·cs.CL·November 5, 2024

Generic Embedding-Based Lexicons for Transparent and Reproducible Text Scoring

Catherine Moez

PDF

Open Access

TL;DR

This paper introduces a method for creating transparent, high-performance text scoring lexicons using minimal input from pretrained word embeddings like FastText and GloVe, bridging the gap between opaque models and manual tools.

Contribution

It proposes a novel approach to generate lexicons from generic embeddings, combining transparency with competitive performance.

Findings

01

Lexicons created from FastText and GloVe embeddings are effective.

02

Embedding-based lexicons offer transparency and high performance.

03

The method requires minimal researcher input.

Abstract

With text analysis tools becoming increasingly sophisticated over the last decade, researchers now face a decision of whether to use state-of-the-art models that provide high performance but that can be highly opaque in their operations and computationally intensive to run. The alternative, frequently, is to rely on older, manually crafted textual scoring tools that are transparently and easily applied, but can suffer from limited performance. I present an alternative that combines the strengths of both: lexicons created with minimal researcher inputs from generic (pretrained) word embeddings. Presenting a number of conceptual lexicons produced from FastText and GloVe (6B) vector representations of words, I argue that embedding-based lexicons respond to a need for transparent yet high-performance text measuring tools.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsGloVe Embeddings · fastText