Suvach -- Generated Hindi QA benchmark
Vaishak Narayanan, Prabin Raj KP, Saifudheen Nouphal

TL;DR
Suvach introduces a new Hindi question answering benchmark created with large language models, aiming to provide a more accurate and bias-free evaluation tool for Hindi NLP research.
Contribution
The paper presents a novel Hindi QA benchmark generated by LLMs, addressing biases of existing translation-based datasets and offering a methodology applicable to other tasks.
Findings
New high-quality Hindi QA dataset created with LLMs
Addresses bias issues in existing benchmarks
Facilitates more accurate evaluation of Hindi NLP models
Abstract
Current evaluation benchmarks for question answering (QA) in Indic languages often rely on machine translation of existing English datasets. This approach suffers from bias and inaccuracies inherent in machine translation, leading to datasets that may not reflect the true capabilities of EQA models for Indic languages. This paper proposes a new benchmark specifically designed for evaluating Hindi EQA models and discusses the methodology to do the same for any task. This method leverages large language models (LLMs) to generate a high-quality dataset in an extractive setting, ensuring its relevance for the target language. We believe this new resource will foster advancements in Hindi NLP research by providing a more accurate and reliable evaluation tool.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Neural Networks and Applications · Speech Recognition and Synthesis
