Unsupervised Learning of Semantic Orientation from a   Hundred-Billion-Word Corpus

Peter D. Turney (National Research Council of Canada); Michael L.; Littman (Stowe Research)

arXiv:cs/0212012·cs.LG·May 23, 2007·361 cites

Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus

Peter D. Turney (National Research Council of Canada), Michael L., Littman (Stowe Research)

PDF

Open Access

TL;DR

This paper presents an unsupervised method for determining the semantic orientation of words using large web corpora and PMI analysis, achieving 80% accuracy across various parts of speech.

Contribution

It introduces a simple, scalable algorithm that leverages web search engine queries and PMI to learn semantic orientation without supervision.

Findings

01

Achieved 80% accuracy on 3,596 words

02

Effective across adjectives, adverbs, nouns, and verbs

03

Comparable to supervised methods for adjectives

Abstract

The evaluative character of a word is called its semantic orientation. A positive semantic orientation implies desirability (e.g., "honest", "intrepid") and a negative semantic orientation implies undesirability (e.g., "disturbing", "superfluous"). This paper introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora. The method involves issuing queries to a Web search engine and using pointwise mutual information to analyse the results. The algorithm is empirically evaluated using a training corpus of approximately one hundred billion words -- the subset of the Web that is indexed by the chosen search engine. Tested with 3,596 words (1,614 positive and 1,982 negative), the algorithm attains an accuracy of 80%. The 3,596 test words include adjectives, adverbs, nouns, and verbs. The accuracy is comparable with the results achieved by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques