Fine-Grained Emotion Detection on GoEmotions: Experimental Comparison of Classical Machine Learning, BiLSTM, and Transformer Models

Ani Harutyunyan; Sachin Kumar

arXiv:2601.18162·cs.CL·January 27, 2026

Fine-Grained Emotion Detection on GoEmotions: Experimental Comparison of Classical Machine Learning, BiLSTM, and Transformer Models

Ani Harutyunyan, Sachin Kumar

PDF

Open Access

TL;DR

This paper compares classical ML, BiLSTM, and BERT models for fine-grained emotion detection on GoEmotions, highlighting the strengths of each approach in handling label imbalance and overlap.

Contribution

It provides a comprehensive benchmark of three modeling approaches on GoEmotions, demonstrating the effectiveness of BERT in balancing multiple evaluation metrics.

Findings

01

Logistic regression achieved highest Micro-F1 of 0.51.

02

BERT outperformed others in Macro-F1, Hamming Loss, and Subset Accuracy.

03

Frequent emotions are surface-level, while contextual models better detect rare and ambiguous emotions.

Abstract

Fine-grained emotion recognition is a challenging multi-label NLP task due to label overlap and class imbalance. In this work, we benchmark three modeling families on the GoEmotions dataset: a TF-IDF-based logistic regression system trained with binary relevance, a BiLSTM with attention, and a BERT model fine-tuned for multi-label classification. Experiments follow the official train/validation/test split, and imbalance is mitigated using inverse-frequency class weights. Across several metrics, namely Micro-F1, Macro-F1, Hamming Loss, and Subset Accuracy, we observe that logistic regression attains the highest Micro-F1 of 0.51, while BERT achieves the best overall balance surpassing the official paper's reported results, reaching Macro-F1 0.49, Hamming Loss 0.036, and Subset Accuracy 0.36. This suggests that frequent emotions often rely on surface lexical cues, whereas contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Mental Health via Writing