# BanglaEcomReviewCorpus: A dataset for e-commerce product review sentiment analysis

**Authors:** Umme Ayman, Md. Tanvir Ahmed Akash, Taslima Akhter, Saiham Zaman Mridul, Yadab Sutradhar

PMC · DOI: 10.1016/j.dib.2026.112663 · 2026-03-08

## TL;DR

This paper introduces a new dataset of e-commerce product reviews in Bangla for analyzing customer sentiment and behavior.

## Contribution

The paper presents a labeled Bangla e-commerce review dataset with balanced sentiment categories for NLP research.

## Key findings

- The dataset contains 8,685 labeled reviews with balanced positive, negative, and neutral sentiments.
- Statistical and linguistic analyses, including word clouds and n-grams, reveal the dataset's diversity and structure.
- The dataset supports interdisciplinary research and AI training in consumer behavior analysis.

## Abstract

Online shopping has become an integral part of modern life, connecting consumers to a wide array of products and services. Customer feedback plays a crucial role in shaping business strategies, enhancing service quality, and driving product innovation, making it essential for understanding consumer behaviour and preferences. For analysing those feedbacks, a dataset is Collected from popular websites such as Daraz, Bikroy.com, Picabbo, Shajgoj, and others, this dataset comprises 8685 labeled items reflecting diverse customer feedback. Sentiment categories include 3012 positive, 2881 negative, and 2792 neutral sentences, offering balanced representation for fair sentiment analysis. This dataset is ideal for natural language processing (NLP) tasks, enabling advanced sentiment analysis and exploration of consumer behaviour. It incorporates a range of statistical studies, including summary statistics, histograms, and linguistic patterns such as unigrams, bigrams, and trigrams. Additionally, visualizations like word clouds provide insights into the dataset's structure and linguistic diversity. To ensure data integrity, rigorous collection methods, anonymization, and preprocessing techniques were employed. This publicly available dataset serves as a valuable resource for advancing sentiment analysis, improving business strategies, and supporting interdisciplinary research. It enables insights into customer behavior, aids product and service development, and can be used in both academic teaching and AI training. By capturing feedback from diverse e-commerce platforms, the dataset fosters collaboration across fields such as sociology, linguistics, and psychology.

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13011172/full.md

---
Source: https://tomesphere.com/paper/PMC13011172