# Twitter data emotion analysis using Hadoop and metaheuristic optimized Graphical Neural Network

**Authors:** Xiaohui Wang, Yang Li, Fangyuan Chen

PMC · DOI: 10.3389/frai.2025.1672252 · Frontiers in Artificial Intelligence · 2025-10-23

## TL;DR

This paper presents a system for analyzing emotions in Twitter data using Hadoop and a modified optimization algorithm to improve neural network performance.

## Contribution

The study introduces MEHO, a modified optimization algorithm, and an automated dataset construction system for improved emotion analysis in social media data.

## Key findings

- MEHO reduces premature convergence by 40% and improves classification accuracy by 6.1% compared to standard EHO.
- The automated labeling system decreases manual effort by 80%.
- Entropy-based preprocessing increases phrase difficulty classification accuracy by 7%.

## Abstract

This study applies the Hive framework within the Hadoop ecosystem for sentiment classification, focusing on emotion analysis of X data. After outlining Hadoop’s core advantages in large-scale unstructured data processing, the study focuses on using a Graphical Neural Network (GNN) for sentiment categorization of Twitter comments. To address the suboptimal performance of traditional GNNs due to trial-and-error hyperparameter tuning, the study introduces the Modified Elephant Herd Optimization (MEHO) algorithm—improved version of the standard EHO, to optimize the network’s weight parameters, hyperparameters, and feature subsets, ensuring a balance between exploration and exploitation. An automated dataset construction system has also been developed to reduce manual labeling effort and ensure consistency. Preprocessing techniques, including information entropy–based phrase ranking, further enhance data quality. To capture both semantic and statistical features of tweets, feature extraction methods such as Term Frequency–Inverse Document Frequency (TF–IDF) and Bag of Words (BoW) are integrated. Experimental results demonstrate that MEHO reduces premature convergence by 40% and improves classification accuracy by 6.1% compared with the standard EHO algorithm. The automated labeling system decreases manual effort by 80%, while entropy-based preprocessing increases phrase difficulty classification accuracy by 7%. This study provides an effective solution for social media emotion analysis; future research will explore multi-modal data fusion and optimize MEHO’s convergence speed for ultra-large feature sets.

## Full-text entities

- **Diseases:** anxiety (MESH:D001007), psychological disorders (MESH:D000067073), EHO (MESH:D016715), depression (MESH:D003866)
- **Chemicals:** Hadoop (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Elephantidae (elephants, family) [taxon 9780], Gammacoronavirus (genus) [taxon 694013], Sus scrofa (pig, species) [taxon 9823]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12590436/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12590436/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/PMC12590436/full.md

---
Source: https://tomesphere.com/paper/PMC12590436