SentiWords: Deriving a High Precision and High Coverage Lexicon for Sentiment Analysis
Lorenzo Gatti, Marco Guerini, Marco Turchi

TL;DR
This paper introduces SentiWords, a high-coverage, high-precision sentiment lexicon created by blending various SentiWordNet-based methods within a learning framework, significantly improving sentiment analysis performance.
Contribution
It presents a novel ensemble approach that combines multiple SentiWordNet techniques and manual lexica to produce a superior sentiment lexicon with extensive coverage and accuracy.
Findings
SentiWords contains approximately 155,000 words.
The ensemble method outperforms individual SentiWordNet approaches.
Using SentiWords improves sentiment analysis accuracy over existing lexica.
Abstract
Deriving prior polarity lexica for sentiment analysis - where positive or negative scores are associated with words out of context - is a challenging task. Usually, a trade-off between precision and coverage is hard to find, and it depends on the methodology used to build the lexicon. Manually annotated lexica provide a high precision but lack in coverage, whereas automatic derivation from pre-existing knowledge guarantees high coverage at the cost of a lower precision. Since the automatic derivation of prior polarities is less time consuming than manual annotation, there has been a great bloom of these approaches, in particular based on the SentiWordNet resource. In this paper, we compare the most frequently used techniques based on SentiWordNet with newer ones and blend them in a learning framework (a so called 'ensemble method'). By taking advantage of manually built prior polarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
