TL;DR
This paper demonstrates that selecting the optimal language query can significantly enhance multilingual language models' question-answering performance, revealing non-intuitive language knowledge distributions.
Contribution
It introduces the concept of Language Specific Knowledge (LSK), the problem of language selection for improved QA, and proposes baselines including the LSKExtractor method.
Findings
Language selection can improve model performance significantly.
Models know different information better in languages other than English.
Performance varies across datasets and models based on language choice.
Abstract
Often, multilingual language models are trained with the objective to map semantically similar content (in different languages) in the same latent space. In this paper, we show a nuance in this training objective, and find that by changing the language of the input query, we can improve the question answering ability of language models. We make two main contributions. First, we introduce the term Language Specific Knowledge (LSK) to denote queries that are best answered in an ``expert language'' for a given LLM, thereby enhancing its question-answering ability. We introduce the problem of language selection -- for some queries, language models can perform better when queried in languages other than English, sometimes even better in low-resource languages -- and the goal is to select the optimal language for the query. Second, we introduce a variety of simple to strong baselines to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
