Bangla Key2Text: Text Generation from Keywords for a Low Resource Language

Tonmoy Talukder; G M Shahariar

arXiv:2604.19508·cs.CL·April 22, 2026

Bangla Key2Text: Text Generation from Keywords for a Low Resource Language

Tonmoy Talukder, G M Shahariar

PDF

1 Repo

TL;DR

This paper presents Bangla Key2Text, a large dataset for keyword-driven text generation in Bangla, along with baseline models and evaluations, to advance low-resource language NLP research.

Contribution

It introduces a new large-scale dataset of Bangla keyword-text pairs and provides baseline models and benchmarks for keyword-to-text generation in Bangla.

Findings

01

Fine-tuning models significantly improves generation quality.

02

Task-specific models outperform zero-shot large language models.

03

The dataset and models are publicly released for future research.

Abstract

This paper introduces \textit{Bangla Key2Text}, a large-scale dataset of $2.6$ million Bangla keyword--text pairs designed for keyword-driven text generation in a low-resource language. The dataset is constructed using a BERT-based keyword extraction pipeline applied to millions of Bangla news texts, transforming raw articles into structured keyword--text pairs suitable for supervised learning. To establish baseline performance on this new benchmark, we fine-tune two sequence-to-sequence models, \texttt{mT5} and \texttt{BanglaT5}, and evaluate them using multiple automatic metrics and human judgments. Experimental results show that task-specific fine-tuning substantially improves keyword-conditioned text generation in Bangla compared to zero-shot large language models. The dataset, trained models, and code are publicly released to support future research in Bangla natural language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.