Unsupervised Distillation of Syntactic Information from Contextualized   Word Representations

Shauli Ravfogel; Yanai Elazar; Jacob Goldberger; Yoav Goldberg

arXiv:2010.05265·cs.CL·March 15, 2021

Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

Shauli Ravfogel, Yanai Elazar, Jacob Goldberger, Yoav Goldberg

PDF

1 Repo

TL;DR

This paper presents an unsupervised method to extract structural syntactic information from contextualized word embeddings like BERT, improving parsing performance in few-shot scenarios.

Contribution

It introduces a metric-learning approach to transform embeddings, emphasizing structural over semantic information without supervision.

Findings

01

Transformations cluster vectors by structure rather than semantics

02

Distilled representations outperform original embeddings in few-shot parsing

03

Unsupervised disentanglement of syntax from semantics achieved

Abstract

Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic tasks. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metric-learning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shauli-ravfogel/NeuralDecomposition
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Tanh Activation · Sigmoid Activation · WordPiece · Long Short-Term Memory · Bidirectional LSTM · Adam · Softmax · Multi-Head Attention · Layer Normalization