MultiBanAbs: A Comprehensive Multi-Domain Bangla Abstractive Text Summarization Dataset

Md. Tanzim Ferdous; Naeem Ahsan Chowdhury; Prithwiraj Bhattacharjee

arXiv:2511.19317·cs.CL·December 17, 2025

MultiBanAbs: A Comprehensive Multi-Domain Bangla Abstractive Text Summarization Dataset

Md. Tanzim Ferdous, Naeem Ahsan Chowdhury, Prithwiraj Bhattacharjee

PDF

Open Access

TL;DR

This paper introduces MultiBanAbs, a large multi-domain Bangla summarization dataset with over 54,000 articles, enabling more adaptable and practical summarization models for diverse real-world texts.

Contribution

It presents a comprehensive, multi-source Bangla summarization dataset and establishes baseline results using various deep learning models, advancing NLP resources for low-resource languages.

Findings

01

The dataset covers multiple domains and styles, enhancing model adaptability.

02

Baseline models demonstrate the dataset's potential for future research.

03

Results show promising performance of deep learning models on the dataset.

Abstract

This study developed a new Bangla abstractive summarization dataset to generate concise summaries of Bangla articles from diverse sources. Most existing studies in this field have concentrated on news articles, where journalists usually follow a fixed writing style. While such approaches are effective in limited contexts, they often fail to adapt to the varied nature of real-world Bangla texts. In today's digital era, a massive amount of Bangla content is continuously produced across blogs, newspapers, and social media. This creates a pressing need for summarization systems that can reduce information overload and help readers understand content more quickly. To address this challenge, we developed a dataset of over 54,000 Bangla articles and summaries collected from multiple sources, including blogs such as Cinegolpo and newspapers such as Samakal and The Business Standard. Unlike…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining