Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis
Donghao Huang, Zhaoxia Wang

TL;DR
This study empirically evaluates how reasoning capabilities in large language models affect sentiment analysis performance, revealing that reasoning benefits are highly task-dependent and often not justified for simpler tasks due to computational costs.
Contribution
It provides a comprehensive analysis of reasoning in LLMs across various tasks and architectures, challenging assumptions about universal reasoning benefits and highlighting task-specific effects.
Findings
Reasoning effectiveness varies significantly with task complexity.
Distilled reasoning often underperforms base models on simple tasks.
Few-shot prompting generally improves performance over zero-shot.
Abstract
Large language models (LLMs) with reasoning capabilities have fueled a compelling narrative that reasoning universally improves performance across language tasks. We test this claim through a comprehensive evaluation of 504 configurations across seven model families--including adaptive, conditional, and reinforcement learning-based reasoning architectures--on sentiment analysis datasets of varying granularity (binary, five-class, and 27-class emotion). Our findings reveal that reasoning effectiveness is strongly task-dependent, challenging prevailing assumptions: (1) Reasoning shows task-complexity dependence--binary classification degrades up to -19.9 F1 percentage points (pp), while 27-class emotion recognition gains up to +16.0pp; (2) Distilled reasoning variants underperform base models by 3-18 pp on simpler tasks, though few-shot prompting enables partial recovery; (3) Few-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Topic Modeling
