GEAR: A Simple GENERATE, EMBED, AVERAGE AND RANK Approach for   Unsupervised Reverse Dictionary

Fatemah Almeman; Luis Espinosa-Anke

arXiv:2412.06654·cs.CL·December 10, 2024

GEAR: A Simple GENERATE, EMBED, AVERAGE AND RANK Approach for Unsupervised Reverse Dictionary

Fatemah Almeman, Luis Espinosa-Anke

PDF

Open Access 1 Repo

TL;DR

GEAR is a straightforward unsupervised reverse dictionary method that combines language models and embeddings, outperforming supervised approaches and analyzing the impact of different dictionary styles.

Contribution

The paper introduces GEAR, a simple unsupervised reverse dictionary approach using generate, embed, average, and rank steps, demonstrating superior performance over supervised baselines.

Findings

01

Outperforms supervised baselines on RD datasets

02

Less prone to overfitting compared to supervised methods

03

Embedding quality varies with dictionary style and target audience

Abstract

Reverse Dictionary (RD) is the task of obtaining the most relevant word or set of words given a textual description or dictionary definition. Effective RD methods have applications in accessibility, translation or writing support systems. Moreover, in NLP research we find RD to be used to benchmark text encoders at various granularities, as it often requires word, definition and sentence embeddings. In this paper, we propose a simple approach to RD that leverages LLMs in combination with embedding models. Despite its simplicity, this approach outperforms supervised baselines in well studied RD datasets, while also showing less over-fitting. We also conduct a number of experiments on different dictionaries and analyze how different styles, registers and target audiences impact the quality of RD systems. We conclude that, on average, untuned embeddings alone fare way below an LLM-only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

F-Almeman/GEAR_RD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Data Mining Algorithms and Applications · Rough Sets and Fuzzy Logic

MethodsSparse Evolutionary Training