BanglaVerb: A sentence-level dataset for transitivity classification in Bangla NLP
Zannatul Mawa Koli, Md. Jahidul Alam, Zakia Sultana, Aliza Ahmed Khan

TL;DR
BanglaVerb is a new dataset for classifying transitive and intransitive verbs in Bangla, supporting various NLP tasks.
Contribution
It introduces a high-quality, validated sentence-level dataset for transitivity classification in an under-resourced language.
Findings
The dataset contains 3001 sentences with 1634 transitive and 1367 intransitive verbs.
Baseline experiments show strong classification performance, indicating dataset robustness.
Lexical and structural statistics confirm the dataset's linguistic representativeness.
Abstract
This article presents BanglaVerb, a systematically curated and linguistically validated sentence-level dataset designed to support transitivity classification in Bangla. The dataset contains 3001 Bangla sentences, each centered on a single verb instance annotated as either transitive (1634) or intransitive (1367). It was developed to address the lack of verb-focused linguistic resources for Bangla, a morphologically rich but under-resourced language in the NLP domain. Sentences were collected from diverse public sources, standardized, and carefully cleaned to ensure textual integrity. Annotation combined rule-based pre-labeling with expert linguistic verification, resulting in a 92% majority-voting agreement among annotators, which reflects high labeling consistency and reliability. Beyond its annotation framework, the dataset provides detailed lexical and structural statistics,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
