Situating Sentence Embedders with Nearest Neighbor Overlap

Lucy H. Lin; Noah A. Smith

arXiv:1909.10724·cs.CL·September 25, 2019·5 cites

Situating Sentence Embedders with Nearest Neighbor Overlap

Lucy H. Lin, Noah A. Smith

PDF

Open Access

TL;DR

This paper introduces nearest neighbor overlap (N2O), a simple, task-agnostic metric for comparing sentence embedders based on neighbor overlap, revealing how design choices influence embedder similarity.

Contribution

We propose N2O, a novel, straightforward method for comparing sentence embedders without relying on benchmark tasks or linguistic probes.

Findings

01

N2O effectively measures embedder similarity across different architectures.

02

Design choices significantly impact the similarity of sentence embedders.

03

N2O provides insights into embedder behavior beyond traditional benchmarks.

Abstract

As distributed approaches to natural language semantics have developed and diversified, embedders for linguistic units larger than words have come to play an increasingly important role. To date, such embedders have been evaluated using benchmark tasks (e.g., GLUE) and linguistic probes. We propose a comparative approach, nearest neighbor overlap (N2O), that quantifies similarity between embedders in a task-agnostic manner. N2O requires only a collection of examples and is simple to understand: two embedders are more similar if, for the same set of inputs, there is greater overlap between the inputs' nearest neighbors. Though applicable to embedders of texts of any size, we focus on sentence embedders and use N2O to show the effects of different design choices and architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification