Enhancing Interpretability using Human Similarity Judgements to Prune   Word Embeddings

Natalia Flechas Manrique; Wanqian Bao; Aurelie Herbelot; Uri Hasson

arXiv:2310.10262·cs.CL·October 17, 2023·1 cites

Enhancing Interpretability using Human Similarity Judgements to Prune Word Embeddings

Natalia Flechas Manrique, Wanqian Bao, Aurelie Herbelot, Uri Hasson

PDF

Open Access

TL;DR

This paper introduces a supervised method to select a subset of word embedding features that align with human similarity judgments across various domains, enhancing interpretability and revealing domain-specific semantic distinctions.

Contribution

It presents a novel supervised feature selection approach for word embeddings that improves interpretability by aligning features with human judgments across multiple semantic domains.

Findings

01

Retains only 20-40% of original embeddings per domain

02

Different feature sets are identified for each semantic domain

03

Features predict human judgments on various semantic dimensions

Abstract

Interpretability methods in NLP aim to provide insights into the semantics underlying specific system architectures. Focusing on word embeddings, we present a supervised-learning method that, for a given domain (e.g., sports, professions), identifies a subset of model features that strongly improve prediction of human similarity judgments. We show this method keeps only 20-40% of the original embeddings, for 8 independent semantic domains, and that it retains different feature sets across domains. We then present two approaches for interpreting the semantics of the retained features. The first obtains the scores of the domain words (co-hyponyms) on the first principal component of the retained embeddings, and extracts terms whose co-occurrence with the co-hyponyms tracks these scores' profile. This analysis reveals that humans differentiate e.g. sports based on how gender-inclusive and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques