LEGAL-UQA: A Low-Resource Urdu-English Dataset for Legal Question Answering
Faizan Faisal, Umair Yousaf

TL;DR
LEGAL-UQA is a novel Urdu-English legal question-answering dataset derived from Pakistan's constitution, enabling NLP research in low-resource languages and specialized legal domains.
Contribution
The paper introduces the first Urdu legal QA dataset, detailing its creation, and evaluates state-of-the-art models, highlighting challenges and performance in low-resource legal NLP tasks.
Findings
Claude-3.5-Sonnet achieves 99.19% accuracy on LEGAL-UQA.
OpenAI's text-embedding-3-large outperforms Mistral's embed in retrieval tasks.
Fine-tuning multilingual models like mt5-large-UQA-1.0 reveals domain adaptation challenges.
Abstract
We present LEGAL-UQA, the first Urdu legal question-answering dataset derived from Pakistan's constitution. This parallel English-Urdu dataset includes 619 question-answer pairs, each with corresponding legal article contexts, addressing the need for domain-specific NLP resources in low-resource languages. We describe the dataset creation process, including OCR extraction, manual refinement, and GPT-4-assisted translation and generation of QA pairs. Our experiments evaluate the latest generalist language and embedding models on LEGAL-UQA, with Claude-3.5-Sonnet achieving 99.19% human-evaluated accuracy. We fine-tune mt5-large-UQA-1.0, highlighting the challenges of adapting multilingual models to specialized domains. Additionally, we assess retrieval performance, finding OpenAI's text-embedding-3-large outperforms Mistral's mistral-embed. LEGAL-UQA bridges the gap between global NLP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Topic Modeling · Natural Language Processing Techniques
