L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT
Samruddhi Deode, Janhavi Gadre, Aditi Kajale, Ananya Joshi, Raviraj, Joshi

TL;DR
This paper introduces L3Cube-IndicSBERT, a simple method to convert multilingual BERT into effective cross-lingual sentence representations, especially for Indian languages, outperforming existing models on similarity tasks.
Contribution
Proposes a straightforward fine-tuning approach using synthetic datasets to create high-quality multilingual sentence embeddings for Indic languages and beyond.
Findings
IndicSBERT outperforms LaBSE, LASER, and MPNet on Indic language similarity tasks.
The approach works effectively for non-Indic languages like German and French.
Monolingual SBERT models perform competitively with IndicSBERT.
Abstract
The multilingual Sentence-BERT (SBERT) models map different languages to common representation space and are useful for cross-language similarity and mining tasks. We propose a simple yet effective approach to convert vanilla multilingual BERT models into multilingual sentence BERT models using synthetic corpus. We simply aggregate translated NLI or STS datasets of the low-resource target languages together and perform SBERT-like fine-tuning of the vanilla multilingual BERT model. We show that multilingual BERT models are inherent cross-lingual learners and this simple baseline fine-tuning approach without explicit cross-lingual training yields exceptional cross-lingual properties. We show the efficacy of our approach on 10 major Indic languages and also show the applicability of our approach to non-Indic languages German and French. Using this approach, we further present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗l3cube-pune/marathi-sentence-similarity-sbertmodel· 162 dl· ♡ 3162 dl♡ 3
- 🤗l3cube-pune/hindi-sentence-similarity-sbertmodel· 2.5k dl· ♡ 72.5k dl♡ 7
- 🤗l3cube-pune/hindi-sentence-bert-nlimodel· 33 dl· ♡ 233 dl♡ 2
- 🤗l3cube-pune/marathi-sentence-bert-nlimodel· 95 dl· ♡ 195 dl♡ 1
- 🤗l3cube-pune/bengali-sentence-similarity-sbertmodel· 1.6k dl· ♡ 61.6k dl♡ 6
- 🤗l3cube-pune/gujarati-sentence-similarity-sbertmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗l3cube-pune/tamil-sentence-similarity-sbertmodel· 79 dl· ♡ 379 dl♡ 3
- 🤗l3cube-pune/telugu-sentence-similarity-sbertmodel· 56 dl· ♡ 156 dl♡ 1
- 🤗l3cube-pune/odia-sentence-similarity-sbertmodel· 4 dl· ♡ 24 dl♡ 2
- 🤗l3cube-pune/kannada-sentence-similarity-sbertmodel· 16 dl· ♡ 216 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Attention Dropout · WordPiece · Dense Connections · Dropout · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia?
