TL;DR
This paper systematically evaluates pretrained multilingual text encoders for unsupervised cross-lingual retrieval, finding they underperform compared to CLWEs at document level but excel at sentence level with task-specific fine-tuning.
Contribution
It provides a comprehensive empirical analysis of multilingual encoders' effectiveness for unsupervised cross-lingual retrieval across many language pairs.
Findings
Pretrained encoders do not outperform CLWEs in unsupervised document retrieval.
State-of-the-art performance is achievable in sentence retrieval with specialized encoder variants.
Off-the-shelf encoders are less effective than fine-tuned variants for sentence-level tasks.
Abstract
Pretrained multilingual text encoders based on neural Transformer architectures, such as multilingual BERT (mBERT) and XLM, have achieved strong performance on a myriad of language understanding tasks. Consequently, they have been adopted as a go-to paradigm for multilingual and cross-lingual representation learning and transfer, rendering cross-lingual word embeddings (CLWEs) effectively obsolete. However, questions remain to which extent this finding generalizes 1) to unsupervised settings and 2) for ad-hoc cross-lingual IR (CLIR) tasks. Therefore, in this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a large number of language pairs. In contrast to supervised language understanding, our results indicate that for unsupervised document-level CLIR -- a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · WordPiece · Residual Connection · Dense Connections · Layer Normalization · Attention Is All You Need · Byte Pair Encoding · Label Smoothing
