Circles are like Ellipses, or Ellipses are like Circles? Measuring the   Degree of Asymmetry of Static and Contextual Embeddings and the Implications   to Representation Learning

Wei Zhang; Murray Campbell; Yang Yu; Sadhana Kumaravel

arXiv:2012.01631·cs.CL·December 4, 2020

Circles are like Ellipses, or Ellipses are like Circles? Measuring the Degree of Asymmetry of Static and Contextual Embeddings and the Implications to Representation Learning

Wei Zhang, Murray Campbell, Yang Yu, Sadhana Kumaravel

PDF

Open Access

TL;DR

This paper investigates the geometric property of asymmetry in word embeddings, comparing static and contextual models like BERT, and introduces a Bayesian asymmetry score to better evaluate embedding quality and representation learning.

Contribution

It introduces a novel Bayesian asymmetry score for contextual embeddings and provides insights into their geometric properties compared to static embeddings.

Findings

01

Contextual embeddings show more randomness in similarity judgments.

02

Contextual embeddings perform well on asymmetry judgment tasks.

03

The Bayesian approach offers a new intrinsic evaluation perspective.

Abstract

Human judgments of word similarity have been a popular method of evaluating the quality of word embedding. But it fails to measure the geometry properties such as asymmetry. For example, it is more natural to say "Ellipses are like Circles" than "Circles are like Ellipses". Such asymmetry has been observed from a psychoanalysis test called word evocation experiment, where one word is used to recall another. Although useful, such experimental data have been significantly understudied for measuring embedding quality. In this paper, we use three well-known evocation datasets to gain insights into asymmetry encoding of embedding. We study both static embedding as well as contextual embedding, such as BERT. Evaluating asymmetry for BERT is generally hard due to the dynamic nature of embedding. Thus, we probe BERT's conditional probabilities (as a language model) using a large number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling

MethodsLinear Layer · WordPiece · Residual Connection · Dense Connections · Attention Is All You Need · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Weight Decay · Linear Warmup With Linear Decay