BanglishRev: A Large-Scale Bangla-English and Code-mixed Dataset of   Product Reviews in E-Commerce

Mohammad Nazmush Shamael; Sabila Nawshin; Swakkhar Shatabda; Salekul; Islam

arXiv:2412.13161·cs.CL·December 19, 2024

BanglishRev: A Large-Scale Bangla-English and Code-mixed Dataset of Product Reviews in E-Commerce

Mohammad Nazmush Shamael, Sabila Nawshin, Swakkhar Shatabda, Salekul, Islam

PDF

Open Access 3 Datasets

TL;DR

BanglishRev is the largest dataset of Bengali-English and code-mixed e-commerce reviews, enabling effective sentiment analysis with a new BanglishBERT model achieving 94% accuracy.

Contribution

This paper introduces BanglishRev, a large-scale, multi-lingual e-commerce review dataset, and demonstrates its utility through a high-performing sentiment analysis model.

Findings

01

The dataset contains 1.74 million reviews from 128k products.

02

The BanglishBERT model achieved 94% accuracy in sentiment classification.

03

The dataset enables future research in code-mixed language processing.

Abstract

This work presents the BanglishRev Dataset, the largest e-commerce product review dataset to date for reviews written in Bengali, English, a mixture of both and Banglish, Bengali words written with English alphabets. The dataset comprises of 1.74 million written reviews from 3.2 million ratings information collected from a total of 128k products being sold in online e-commerce platforms targeting the Bengali population. It includes an extensive array of related metadata for each of the reviews including the rating given by the reviewer, date the review was posted and date of purchase, number of likes, dislikes, response from the seller, images associated with the review etc. With sentiment analysis being the most prominent usage of review datasets, experimentation with a binary sentiment analysis model with the review rating serving as an indicator of positive or negative sentiment was…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Natural Language Processing Techniques · Text and Document Classification Technologies