A Case Study of Cross-Lingual Zero-Shot Generalization for Classical Languages in LLMs

V.S.D.S.Mahesh Akavarapu; Hrishikesh Terdalkar; Pramit Bhattacharyya; Shubhangi Agarwal; Vishakha Deulgaonkar; Pralay Manna; Chaitali Dangarikar; Arnab Bhattacharya

arXiv:2505.13173·cs.CL·June 3, 2025

A Case Study of Cross-Lingual Zero-Shot Generalization for Classical Languages in LLMs

V.S.D.S.Mahesh Akavarapu, Hrishikesh Terdalkar, Pramit Bhattacharyya, Shubhangi Agarwal, Vishakha Deulgaonkar, Pralay Manna, Chaitali Dangarikar, Arnab Bhattacharya

PDF

Open Access 1 Repo

TL;DR

This paper investigates how large language models perform on classical languages like Sanskrit, Greek, and Latin, focusing on zero-shot cross-lingual tasks such as NER, translation, and QA, revealing the importance of model size and retrieval methods.

Contribution

It provides a detailed analysis of cross-lingual zero-shot generalization in classical languages and highlights the impact of model scale and retrieval-augmented techniques on performance.

Findings

01

Larger models outperform smaller ones in cross-lingual tasks.

02

Retrieval-augmented generation improves QA performance in Sanskrit.

03

Smaller models show significant performance drops in niche tasks.

Abstract

Large Language Models (LLMs) have demonstrated remarkable generalization capabilities across diverse tasks and languages. In this study, we focus on natural language understanding in three classical languages -- Sanskrit, Ancient Greek and Latin -- to investigate the factors affecting cross-lingual zero-shot generalization. First, we explore named entity recognition and machine translation into English. While LLMs perform equal to or better than fine-tuned baselines on out-of-domain data, smaller models often struggle, especially with niche or abstract entity types. In addition, we concentrate on Sanskrit by presenting a factoid question-answering (QA) dataset and show that incorporating context via retrieval-augmented generation approach significantly boosts performance. In contrast, we observe pronounced performance drops for smaller LLMs across these QA tasks. These results suggest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mahesh-ak/SktQA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsFocus