Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus
Peter D. Turney (National Research Council of Canada), Michael L., Littman (Stowe Research)

TL;DR
This paper presents an unsupervised method for determining the semantic orientation of words using large web corpora and PMI analysis, achieving 80% accuracy across various parts of speech.
Contribution
It introduces a simple, scalable algorithm that leverages web search engine queries and PMI to learn semantic orientation without supervision.
Findings
Achieved 80% accuracy on 3,596 words
Effective across adjectives, adverbs, nouns, and verbs
Comparable to supervised methods for adjectives
Abstract
The evaluative character of a word is called its semantic orientation. A positive semantic orientation implies desirability (e.g., "honest", "intrepid") and a negative semantic orientation implies undesirability (e.g., "disturbing", "superfluous"). This paper introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora. The method involves issuing queries to a Web search engine and using pointwise mutual information to analyse the results. The algorithm is empirically evaluated using a training corpus of approximately one hundred billion words -- the subset of the Web that is indexed by the chosen search engine. Tested with 3,596 words (1,614 positive and 1,982 negative), the algorithm attains an accuracy of 80%. The 3,596 test words include adjectives, adverbs, nouns, and verbs. The accuracy is comparable with the results achieved by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
