Preserving Multilingual Quality While Tuning Query Encoder on English Only
Oleg Vasilyev, Randy Sawaya, John Bohannon

TL;DR
This paper investigates whether tuning a multilingual query encoder on English-only data can preserve or enhance its multilingual capabilities, demonstrating that low-rate tuning maintains or improves original qualities.
Contribution
It introduces the concept of adiabatic tuning, showing that low learning rate tuning on narrow data can preserve or enhance a multilingual encoder's qualities.
Findings
Tuning on English data preserves or improves multilingual qualities.
Low learning rate tuning (adiabatic tuning) helps maintain original encoder properties.
Multilingual embedding quality can be enhanced through targeted low-rate tuning.
Abstract
A query encoder of a dual passage retrieval system can be tuned for specific types of queries or domains, while the precomputed and stored documents representations are kept intact. Switching from one query encoder to another when needed is easily feasible, unlike overhauling the embeddings of a whole knowledge base. In this work we raise a question: Can the generic, original qualities of the encoder be preserved or at least left not too degraded when it is tuned on a narrow domain? We conducted experiments on a high quality multilingual embedding model: Tuning it on a single English-only dataset, we observe that the tuning not only preserves the multilingual qualities, but even improves them. The embedding qualities on distinctly different data are also improved or at least preserved. Drawing on our observations, we suggest a more general hypothesis: Tuning with intentionally low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Mining Algorithms and Applications
