Assessing the Reliability of Large Language Models in the Bengali Legal Context: A Comparative Evaluation Using LLM-as-Judge and Legal Experts

Sabik Aftahee; A.F.M. Farhad; Arpita Mallik; Ratnajit Dhar; Jawadul Karim; Nahiyan Bin Noor; Ishmam Ahmed Solaiman

arXiv:2511.05627·cs.CY·November 11, 2025

Assessing the Reliability of Large Language Models in the Bengali Legal Context: A Comparative Evaluation Using LLM-as-Judge and Legal Experts

Sabik Aftahee, A.F.M. Farhad, Arpita Mallik, Ratnajit Dhar, Jawadul Karim, Nahiyan Bin Noor, Ishmam Ahmed Solaiman

PDF

Open Access

TL;DR

This study evaluates the accuracy and safety of various large language models in providing legal advice in Bangladesh, highlighting their potential and risks through expert and automated assessments.

Contribution

It introduces a comprehensive evaluation framework combining AI and legal expert assessments to analyze LLMs' legal response quality in a developing country context.

Findings

01

AI models often produce high-quality legal responses

02

Models sometimes generate dangerous misinformation

03

Expert validation is essential for safe deployment

Abstract

Accessing legal help in Bangladesh is hard. People face high fees, complex legal language, a shortage of lawyers, and millions of unresolved court cases. Generative AI models like OpenAI GPT-4.1 Mini, Gemini 2.0 Flash, Meta Llama 3 70B, and DeepSeek R1 could potentially democratize legal assistance by providing quick and affordable legal advice. In this study, we collected 250 authentic legal questions from the Facebook group "Know Your Rights," where verified legal experts regularly provide authoritative answers. These questions were subsequently submitted to four four advanced AI models and responses were generated using a consistent, standardized prompt. A comprehensive dual evaluation framework was employed, in which a state-of-the-art LLM model served as a judge, assessing each AI-generated response across four critical dimensions: factual accuracy, legal appropriateness,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Ethics and Social Impacts of AI · Computational and Text Analysis Methods