TL;DR
This paper presents a method for learning syntactic sentence embeddings using multilingual parallel corpora with POS tags, demonstrating improved performance and transfer learning capabilities, especially for low-resource languages.
Contribution
It introduces a novel approach to syntactic sentence embedding learning leveraging multilingual data and POS tags, outperforming existing language models in efficiency and accuracy.
Findings
Effective syntactic embeddings learned with less data
Strong transfer learning evidence for low-resource languages
Better evaluation metrics than state-of-the-art models
Abstract
We study methods for learning sentence embeddings with syntactic structure. We focus on methods of learning syntactic sentence-embeddings by using a multilingual parallel-corpus augmented by Universal Parts-of-Speech tags. We evaluate the quality of the learned embeddings by examining sentence-level nearest neighbours and functional dissimilarity in the embedding space. We also evaluate the ability of the method to learn syntactic sentence-embeddings for low-resource languages and demonstrate strong evidence for transfer learning. Our results show that syntactic sentence-embeddings can be learned while using less training data, fewer model parameters, and resulting in better evaluation metrics than state-of-the-art language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
