PolyPrompt: Automating Knowledge Extraction from Multilingual Language Models with Dynamic Prompt Generation
Nathan Roll

TL;DR
PolyPrompt is a new framework that improves multilingual performance of large language models by dynamically generating language-specific prompts, leading to significant accuracy gains across diverse languages.
Contribution
It introduces a parameter-efficient, gradient-based method for learning trigger tokens per language, enhancing multilingual capabilities of LLMs.
Findings
Achieves 3.7%-19.9% accuracy improvements on MMLU benchmark.
Demonstrates effectiveness across 15 typologically diverse languages.
Validates approach on two ~1 billion parameter models.
Abstract
Large language models (LLMs) showcase increasingly impressive English benchmark scores, however their performance profiles remain inconsistent across multilingual settings. To address this gap, we introduce PolyPrompt, a novel, parameter-efficient framework for enhancing the multilingual capabilities of LLMs. Our method learns a set of trigger tokens for each language through a gradient-based search, identifying the input query's language and selecting the corresponding trigger tokens which are prepended to the prompt during inference. We perform experiments on two ~1 billion parameter models, with evaluations on the global MMLU benchmark across fifteen typologically and resource diverse languages, demonstrating accuracy gains of 3.7%-19.9% compared to naive and translation-pipeline baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsSparse Evolutionary Training
