Improving Multilingual Semantic Textual Similarity with Shared Sentence Encoder for Low-resource Languages
Xin Tang, Shanbo Cheng, Loc Do, Zhiyu Min, Feng Ji, Heng Yu, Ji Zhang,, Haiqin Chen

TL;DR
This paper introduces a shared multilingual encoder framework that improves semantic textual similarity measurement in low-resource languages by leveraging rich-resource language data without relying on machine translation, demonstrating superior performance over existing methods.
Contribution
The paper proposes a novel shared multilingual encoder approach that enhances STS in low-resource languages using rich-resource language data, avoiding translation biases and inefficiencies.
Findings
Significant improvement over state-of-the-art methods on SemEval STS task.
Outperforms machine translation-based approaches in low-resource scenarios.
Maintains consistent performance in industrial applications where MT fails.
Abstract
Measuring the semantic similarity between two sentences (or Semantic Textual Similarity - STS) is fundamental in many NLP applications. Despite the remarkable results in supervised settings with adequate labeling, little attention has been paid to this task in low-resource languages with insufficient labeling. Existing approaches mostly leverage machine translation techniques to translate sentences into rich-resource language. These approaches either beget language biases, or be impractical in industrial applications where spoken language scenario is more often and rigorous efficiency is required. In this work, we propose a multilingual framework to tackle the STS task in a low-resource language e.g. Spanish, Arabic , Indonesian and Thai, by utilizing the rich annotation data in a rich resource language, e.g. English. Our approach is extended from a basic monolingual STS framework to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
