Linear Transformations for Cross-lingual Semantic Textual Similarity

Tom\'a\v{s} Brychc\'in

arXiv:1807.04172·cs.CL·July 12, 2018

Linear Transformations for Cross-lingual Semantic Textual Similarity

Tom\'a\v{s} Brychc\'in

PDF

TL;DR

This paper introduces a novel linear transformation method for cross-lingual semantic textual similarity that leverages bilingual dictionaries and word weighting, outperforming previous approaches without relying on machine translation.

Contribution

The paper proposes a new linear transformation technique using bilingual dictionaries and word weighting to improve cross-lingual semantic similarity without heavy supervision.

Findings

01

Outperforms existing methods on multiple datasets

02

Unsupervised sentence similarity can be significantly improved

03

Word weighting enhances transformation effectiveness

Abstract

Cross-lingual semantic textual similarity systems estimate the degree of the meaning similarity between two sentences, each in a different language. State-of-the-art algorithms usually employ machine translation and combine vast amount of features, making the approach strongly supervised, resource rich, and difficult to use for poorly-resourced languages. In this paper, we study linear transformations, which project monolingual semantic spaces into a shared space using bilingual dictionaries. We propose a novel transformation, which builds on the best ideas from prior works. We experiment with unsupervised techniques for sentence similarity based only on semantic spaces and we show they can be significantly improved by the word weighting. Our transformation outperforms other methods and together with word weighting leads to very promising results on several datasets in different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.