Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Translation
Marii Ojastu, Hele-Andra Kuulmets, Aleksei Dorkin, Marika Borovikova, Dage S\"arg, Kairit Sirts

TL;DR
This study introduces an Estonian version of the WinoGrande dataset, compares model performance on human and machine translations, and assesses the impact of prompt engineering on translation quality.
Contribution
It provides a culturally adapted Estonian dataset and analyzes model performance differences between human and machine translations, highlighting the importance of expert involvement.
Findings
Model performance is slightly lower on human-translated Estonian data compared to English.
Performance on machine-translated data is significantly worse.
Prompt engineering offers limited improvements in translation quality and model accuracy.
Abstract
In this paper, we present a localized and culturally adapted Estonian translation of the test set from the widely used commonsense reasoning benchmark, WinoGrande. We detail the translation and adaptation process carried out by translation specialists and evaluate the performance of both proprietary and open source models on the human translated benchmark. Additionally, we explore the feasibility of achieving high-quality machine translation by incorporating insights from the manual translation process into the design of a detailed prompt. This prompt is specifically tailored to address both the linguistic characteristics of Estonian and the unique translation challenges posed by the WinoGrande dataset. Our findings show that model performance on the human translated Estonian dataset is slightly lower than on the original English test set, while performance on machine-translated data is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
