What you can cram into a single vector: Probing sentence embeddings for   linguistic properties

Alexis Conneau; German Kruszewski; Guillaume Lample; Lo\"ic Barrault,; Marco Baroni

arXiv:1805.01070·cs.CL·July 10, 2018·6 cites

What you can cram into a single vector: Probing sentence embeddings for linguistic properties

Alexis Conneau, German Kruszewski, Guillaume Lample, Lo\"ic Barrault,, Marco Baroni

PDF

Open Access 5 Repos

TL;DR

This paper introduces 10 probing tasks to analyze the linguistic properties captured by sentence embeddings, revealing insights into how different encoders and training methods encode linguistic information.

Contribution

It presents a new set of probing tasks for detailed analysis of sentence embeddings and compares multiple encoders and training strategies to understand their linguistic representations.

Findings

01

Different encoders capture distinct linguistic features

02

Training methods influence the type of information encoded

03

Probing tasks reveal nuanced properties of sentence embeddings

Abstract

Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing. "Downstream" tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the representations. We introduce here 10 probing tasks designed to capture simple linguistic features of sentences, and we use them to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of both encoders and training methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling