Comparative Evaluation of Embedding Representations for Financial News Sentiment Analysis
Joyjit Roy, Samaresh Kumar Singh

TL;DR
This study evaluates embedding-based methods for financial news sentiment analysis in small datasets, revealing limitations of pretrained embeddings and emphasizing the importance of data sufficiency.
Contribution
It provides a comparative analysis of embedding techniques in resource-constrained environments and highlights the impact of data scarcity on model performance.
Findings
Pretrained embeddings offer limited benefits below a data threshold.
Models show overfitting due to small validation sets.
Embedding quality alone cannot overcome data scarcity issues.
Abstract
Financial sentiment analysis enhances market understanding. However, standard Natural Language Processing (NLP) approaches encounter significant challenges when applied to small datasets. This study presents a comparative evaluation of embedding-based techniques for financial news sentiment classification in resource-constrained environments. Word2Vec, GloVe, and sentence transformer representations are evaluated in combination with gradient boosting on a manually labeled dataset of 349 financial news headlines. Experimental results identify a substantial gap between validation and test performance. Despite strong validation metrics, models underperform relative to trivial baselines. The analysis indicates that pretrained embeddings yield diminishing returns below a critical data sufficiency threshold. Small validation sets contribute to overfitting during model selection. Practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
