Machine and Deep Learning Methods with Manual and Automatic Labelling for News Classification in Bangla Language
Istiak Ahmad, Fahad AlQurashi, Rashid Mehmood

TL;DR
This paper explores machine and deep learning techniques with manual and automatic labelling for Bangla news classification, introducing a large dataset and evaluating various algorithms to improve accuracy in NLP tasks.
Contribution
It presents a comprehensive comparison of ML and DL methods with manual and automatic labelling on the new Potrika dataset for Bangla news classification.
Findings
GRU and FastText achieved 91.83% accuracy with manual labelling.
KNN and Doc2Vec achieved 57.72% and 75% accuracy with automatic labelling.
Developed the largest Bangla news dataset with 664,880 articles.
Abstract
Research in Natural Language Processing (NLP) has increasingly become important due to applications such as text classification, text mining, sentiment analysis, POS tagging, named entity recognition, textual entailment, and many others. This paper introduces several machine and deep learning methods with manual and automatic labelling for news classification in the Bangla language. We implemented several machine (ML) and deep learning (DL) algorithms. The ML algorithms are Logistic Regression (LR), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbour (KNN), used with Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and Doc2Vec embedding models. The DL algorithms are Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN), used with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Sentiment Analysis and Opinion Mining
MethodsTanh Activation · Sigmoid Activation · Gated Recurrent Unit · fastText · Long Short-Term Memory · Logistic Regression
