Semantic Relatedness for All (Languages): A Comparative Analysis of   Multilingual Semantic Relatedness Using Machine Translation

Andre Freitas; Siamak Barzegar; Juliano Efson Sales; Siegfried; Handschuh; Brian Davis

arXiv:1805.06522·cs.CL·May 18, 2018

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

Andre Freitas, Siamak Barzegar, Juliano Efson Sales, Siegfried, Handschuh, Brian Davis

PDF

TL;DR

This study compares multilingual semantic relatedness models, showing that machine translation significantly improves performance over native models, with combined translation and English models yielding the best results across 11 languages.

Contribution

It provides a comprehensive comparison of native versus machine-translated semantic models across multiple languages, highlighting the effectiveness of translation-based approaches.

Findings

01

Machine translation improves semantic relatedness accuracy by 16.7%.

02

Combining translation with English Word2Vec models yields the best results.

03

Using informative corpora outweighs translation errors.

Abstract

This paper provides a comparative analysis of the performance of four state-of-the-art distributional semantic models (DSMs) over 11 languages, contrasting the native language-specific models with the use of machine translation over English-based DSMs. The experimental results show that there is a significant improvement (average of 16.7% for the Spearman correlation) by using state-of-the-art machine translation approaches. The results also show that the benefit of using the most informative corpus outweighs the possible errors introduced by the machine translation. For all languages, the combination of machine translation over the Word2Vec English distributional model provided the best results consistently (average Spearman correlation of 0.68).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.