On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data
Kabir Ahuja, Monojit Choudhury, Sandipan Dandapat

TL;DR
This paper introduces a framework based on microeconomic production functions to evaluate the cost-performance trade-offs between machine-translated and manual data in multilingual model fine-tuning, revealing that manual data often yields optimal cost-efficiency.
Contribution
It is the first to extend production functions to analyze data collection strategies for multilingual NLP models, providing a systematic approach to cost-performance trade-offs.
Findings
Optimal performance at minimal cost often involves manual data when translation costs are positive.
The framework effectively evaluates cost and performance trade-offs in multilingual model training.
Case study on TyDIQA-GoldP dataset demonstrates practical applicability.
Abstract
Borrowing ideas from {\em Production functions} in micro-economics, in this paper we introduce a framework to systematically evaluate the performance and cost trade-offs between machine-translated and manually-created labelled data for task-specific fine-tuning of massively multilingual language models. We illustrate the effectiveness of our framework through a case-study on the TyDIQA-GoldP dataset. One of the interesting conclusions of the study is that if the cost of machine translation is greater than zero, the optimal performance at least cost is always achieved with at least some or only manually-created data. To our knowledge, this is the first attempt towards extending the concept of production functions to study data collection strategies for training multilingual models, and can serve as a valuable tool for other similar cost vs data trade-offs in NLP.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
