Exploring Multilingual Syntactic Sentence Representations

Chen Liu; Anderson de Andrade; Muhammad Osama

arXiv:1910.11768·cs.CL·October 28, 2019

Exploring Multilingual Syntactic Sentence Representations

Chen Liu, Anderson de Andrade, Muhammad Osama

PDF

1 Repo

TL;DR

This paper presents a method for learning syntactic sentence embeddings using multilingual parallel corpora with POS tags, demonstrating improved performance and transfer learning capabilities, especially for low-resource languages.

Contribution

It introduces a novel approach to syntactic sentence embedding learning leveraging multilingual data and POS tags, outperforming existing language models in efficiency and accuracy.

Findings

01

Effective syntactic embeddings learned with less data

02

Strong transfer learning evidence for low-resource languages

03

Better evaluation metrics than state-of-the-art models

Abstract

We study methods for learning sentence embeddings with syntactic structure. We focus on methods of learning syntactic sentence-embeddings by using a multilingual parallel-corpus augmented by Universal Parts-of-Speech tags. We evaluate the quality of the learned embeddings by examining sentence-level nearest neighbours and functional dissimilarity in the embedding space. We also evaluate the ability of the method to learn syntactic sentence-embeddings for low-resource languages and demonstrate strong evidence for transfer learning. Our results show that syntactic sentence-embeddings can be learned while using less training data, fewer model parameters, and resulting in better evaluation metrics than state-of-the-art language models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ccliu2/syn-emb
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.