TL;DR
This paper presents Bangla Key2Text, a large dataset for keyword-driven text generation in Bangla, along with baseline models and evaluations, to advance low-resource language NLP research.
Contribution
It introduces a new large-scale dataset of Bangla keyword-text pairs and provides baseline models and benchmarks for keyword-to-text generation in Bangla.
Findings
Fine-tuning models significantly improves generation quality.
Task-specific models outperform zero-shot large language models.
The dataset and models are publicly released for future research.
Abstract
This paper introduces \textit{Bangla Key2Text}, a large-scale dataset of million Bangla keyword--text pairs designed for keyword-driven text generation in a low-resource language. The dataset is constructed using a BERT-based keyword extraction pipeline applied to millions of Bangla news texts, transforming raw articles into structured keyword--text pairs suitable for supervised learning. To establish baseline performance on this new benchmark, we fine-tune two sequence-to-sequence models, \texttt{mT5} and \texttt{BanglaT5}, and evaluate them using multiple automatic metrics and human judgments. Experimental results show that task-specific fine-tuning substantially improves keyword-conditioned text generation in Bangla compared to zero-shot large language models. The dataset, trained models, and code are publicly released to support future research in Bangla natural language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
