Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement

Mohammed Rakibul Hasan; Rafi Majid; Ahanaf Tahmid

arXiv:2508.19887·cs.CL·August 28, 2025

Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement

Mohammed Rakibul Hasan, Rafi Majid, Ahanaf Tahmid

PDF

5 Datasets

TL;DR

Bangla-Bayanno is a high-quality, open-source Bengali VQA dataset with 52,650 question-answer pairs, created using LLM-assisted translation to improve translation quality and support low-resource multimodal AI research.

Contribution

This paper introduces Bangla-Bayanno, the first large-scale Bengali VQA dataset refined with LLM-assisted translation to ensure high quality and inclusivity in low-resource language AI research.

Findings

01

The dataset contains 52,650 question-answer pairs.

02

Questions are categorized into nominal, quantitative, and polar types.

03

Provides a comprehensive benchmark for Bengali multimodal AI.

Abstract

In this paper, we introduce Bangla-Bayanno, an open-ended Visual Question Answering (VQA) Dataset in Bangla, a widely used, low-resource language in multimodal AI research. The majority of existing datasets are either manually annotated with an emphasis on a specific domain, query type, or answer type or are constrained by niche answer formats. In order to mitigate human-induced errors and guarantee lucidity, we implemented a multilingual LLM-assisted translation refinement pipeline. This dataset overcomes the issues of low-quality translations from multilingual sources. The dataset comprises 52,650 question-answer pairs across 4750+ images. Questions are classified into three distinct answer types: nominal (short descriptive), quantitative (numeric), and polar (yes/no). Bangla-Bayanno provides the most comprehensive open-source, high-quality VQA benchmark in Bangla, aiming to advance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.