sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting

Sanchit Ahuja; Kumar Tanmay; Hardik Hansrajbhai Chauhan; Barun Patra; Kriti Aggarwal; Luciano Del Corro; Arindam Mitra; Tejas Indulal Dhamecha; Ahmed Awadallah; Monojit Choudhary; Vishrav Chaudhary; Sunayana Sitaram

arXiv:2407.09879·cs.CL·June 23, 2025

sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting

Sanchit Ahuja, Kumar Tanmay, Hardik Hansrajbhai Chauhan, Barun Patra, Kriti Aggarwal, Luciano Del Corro, Arindam Mitra, Tejas Indulal Dhamecha, Ahmed Awadallah, Monojit Choudhary, Vishrav Chaudhary, Sunayana Sitaram

PDF

Open Access

TL;DR

This paper introduces sPhinX, a method for improving multilingual instruction fine-tuning of LLMs by creating diverse synthetic datasets and employing N-shot guided prompting, significantly boosting performance across multiple languages.

Contribution

The paper presents a novel multilingual synthetic dataset construction method and a guided fine-tuning strategy that enhance multilingual LLM capabilities with minimal forgetting.

Findings

01

Improves Mistral-7B performance by 39.8% on multilingual benchmarks.

02

Enhances Phi-3-Small performance by 11.2%.

03

Maintains strong English performance with minimal catastrophic forgetting.

Abstract

Despite the remarkable success of large language models (LLMs) in English, a significant performance gap remains in non-English languages. To address this, we introduce a novel approach for strategically constructing a multilingual synthetic instruction tuning dataset, sPhinX. Unlike prior methods that directly translate fixed instruction-response pairs, sPhinX enhances diversity by selectively augmenting English instruction-response pairs with multilingual translations. Additionally, we propose LANGIT, a novel N-shot guided fine-tuning strategy, which further enhances model performance by incorporating contextually relevant examples in each training sample. Our ablation study shows that our approach enhances the multilingual capabilities of Mistral-7B and Phi-3-Small improving performance by an average of 39.8% and 11.2%, respectively, across multilingual benchmarks in reasoning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Video Analysis and Summarization · Speech Recognition and Synthesis

MethodsBalanced Selection