BNLI: A Linguistically-Refined Bengali Dataset for Natural Language Inference

Farah Binta Haque; Md Yasin; Shishir Saha; Md Shoaib Akhter Rafi; and Farig Sadeque

arXiv:2511.08813·cs.CL·November 13, 2025

BNLI: A Linguistically-Refined Bengali Dataset for Natural Language Inference

Farah Binta Haque, Md Yasin, Shishir Saha, Md Shoaib Akhter Rafi, and Farig Sadeque

PDF

Open Access

TL;DR

BNLI is a carefully curated Bengali NLI dataset that improves linguistic diversity and annotation quality, enabling more effective model training and evaluation for Bengali language understanding.

Contribution

We introduce BNLI, a linguistically refined Bengali NLI dataset with a rigorous annotation process, addressing previous resource limitations and inconsistencies.

Findings

01

State-of-the-art models perform better on BNLI, indicating improved semantic understanding.

02

BNLI enhances the reliability and interpretability of Bengali NLI tasks.

03

Benchmark results establish BNLI as a strong foundation for future Bengali NLP research.

Abstract

Despite the growing progress in Natural Language Inference (NLI) research, resources for the Bengali language remain extremely limited. Existing Bengali NLI datasets exhibit several inconsistencies, including annotation errors, ambiguous sentence pairs, and inadequate linguistic diversity, which hinder effective model training and evaluation. To address these limitations, we introduce BNLI, a refined and linguistically curated Bengali NLI dataset designed to support robust language understanding and inference modeling. The dataset was constructed through a rigorous annotation pipeline emphasizing semantic clarity and balance across entailment, contradiction, and neutrality classes. We benchmarked BNLI using a suite of state-of-the-art transformer-based architectures, including multilingual and Bengali-specific models, to assess their ability to capture complex semantic relations in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification