# Ensemble and temporal feature-based framework for rainfall classification in Bangladesh

**Authors:** Mahir Shahriar Tamim, Md. Samiul Alim, Tanvir Ahmed Khan, Maisha Rahman, Md Musfique Anwar

PMC · DOI: 10.1371/journal.pone.0342646 · 2026-03-10

## TL;DR

This paper presents a machine learning framework for classifying daily rainfall in Bangladesh, using weather data to improve agriculture and disaster management.

## Contribution

A novel ensemble and temporal feature-based machine learning framework for nationwide rainfall classification in Bangladesh.

## Key findings

- Random Forest achieved the highest accuracy (77.37%) for rainfall classification.
- Bi-LSTM performed best among deep learning models with 76.97% accuracy.
- Humidity and sunshine duration were identified as the most influential predictors of rainfall intensity.

## Abstract

Accurate rainfall classification is essential for Bangladesh, where monsoon variability strongly influences agriculture, water resource management, and disaster preparedness. This study proposes a robust machine learning framework for rainfall intensity classification at the daily temporal scale and nationwide spatial coverage, using over 543,839 daily weather records collected from 35 meteorological stations across several decades from a publicly available national meteorological dataset. The dataset includes rainfall, temperature, humidity, and sunshine duration, which were preprocessed and categorized into four intensity levels: No Rain, Light Rain, Moderate Rain, and Very Heavy Rain. Various models were evaluated, including Random Forest, Decision Trees, Gradient Boosting, K-Nearest Neighbors, Naïve Bayes, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), along with deep learning architectures such as Artificial Neural Network (ANN), Deep Neural Network (DNN), One-Dimensional Convolutional Neural Network (1D-CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (Bi-LSTM). Random Forest achieved the highest accuracy (77.37%), while Bi-LSTM performed best among deep learning models (76.97%). To address class imbalance, we adopted class weighting in the final models; SMOTE was explored as an ablation and then excluded due to poorer generalization. Model interpretability using Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) consistently identified humidity and sunshine as the most influential predictors, with SHAP further revealing strong interactions between lagged humidity and temperature. The framework‘s reliable classification of rainfall intensities supports data-driven irrigation scheduling, early flood warnings, and climate-resilient agricultural and disaster management planning in Bangladesh.

## Full-text entities

- **Genes:** RASIP1 (Ras interacting protein 1) [NCBI Gene 54922] {aka RAIN}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** LIME (MESH:D004195), DL (MESH:D007859), SMOTE (MESH:D006963), flood (MESH:C565009)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12974886/full.md

---
Source: https://tomesphere.com/paper/PMC12974886