Explainable detection: a transformer-based language modeling approach for Bengali news title classification with comparative explainability analysis using ML and DL

Md. Julkar Naeen; Sourav Kumar Das; Sakib Alam Jisan; Sharun Akter Khushbu; Noyon Chandra Saha; Ohidujjaman

PMC · DOI:10.3389/frai.2025.1537432·November 6, 2025

Explainable detection: a transformer-based language modeling approach for Bengali news title classification with comparative explainability analysis using ML and DL

Md. Julkar Naeen, Sourav Kumar Das, Sakib Alam Jisan, Sharun Akter Khushbu, Noyon Chandra Saha, Ohidujjaman

PDF

Open Access

TL;DR

This paper explores using transformer models for classifying Bengali news titles, comparing them with ML and LSTM models, and emphasizes explainability in AI for low-resource languages.

Contribution

The study introduces transformer-based models for Bengali text classification and integrates explainable AI techniques to improve transparency.

Findings

01

XLM-RoBERTa Base achieved the highest accuracy of 0.91 in classifying Bengali news titles.

02

Explainable AI techniques like LIME were used to identify key features influencing classification outcomes.

03

Transformer models outperformed traditional ML and LSTM models in Bengali text classification.

Abstract

Classifying scattered Bengali text is the primary focus of this study, with an emphasis on explainability in Natural Language Processing (NLP) for low-resource languages. We employed supervised Machine Learning (ML) models as a baseline and compared their performance with Long Short-Term Memory (LSTM) networks from the deep learning domain. Subsequently, we implemented transformer models designed for sequential learning. To prepare the dataset, we collected recent Bengali news articles online and performed extensive feature engineering. Given the inherent noise in Bengali datasets, significant preprocessing was required. Among the models tested, XLM-RoBERTa Base achieved the highest accuracy 0.91. Furthermore, we integrated explainable AI techniques to interpret the model’s predictions, enhancing transparency and fostering trust in the classification outcomes. Additionally, we employed…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

BERT

Diseases4

LIME DL XAI LSTM

Figures8

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Computational and Text Analysis Methods