Sentiment Classification of Gaza War Headlines: A Comparative Analysis of Large Language Models and Arabic Fine-Tuned BERT Models
Amr Eleraqi, Hager H. Mustafa, and Abdul Hadi N. Ahmed

TL;DR
This paper compares how large language models and Arabic fine-tuned BERT models interpret sentiment in Gaza War headlines, revealing significant differences in bias, interpretive tendencies, and contextual sensitivity.
Contribution
It introduces an epistemological approach to sentiment analysis, emphasizing interpretive differences among models rather than accuracy against a gold standard.
Findings
BERT models tend to be neutral; LLMs amplify negativity.
GPT-4.1 adjusts sentiment based on narrative frames.
Models differ systematically in sentiment distribution and bias.
Abstract
This study examines how different artificial intelligence architectures interpret sentiment in conflict-related media discourse, using the 2023 Gaza War as a case study. Drawing on a corpus of 10,990 Arabic news headlines (Eleraqi 2026), the research conducts a comparative analysis between three large language models and six fine-tuned Arabic BERT models. Rather than evaluating accuracy against a single human-annotated gold standard, the study adopts an epistemological approach that treats sentiment classification as an interpretive act produced by model architectures. To quantify systematic differences across models, the analysis employs information-theoretic and distributional metrics, including Shannon Entropy, Jensen-Shannon Distance, and a Variance Score measuring deviation from aggregate model behavior. The results reveal pronounced and non-random divergence in sentiment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
