Contextual Salience for Fast and Accurate Sentence Vectors
Eric Zelikman, Richard Socher

TL;DR
The paper introduces CoSal, a novel measure of word importance based on contextual salience, enabling fast, interpretable, and accurate unsupervised sentence vectors that outperform many existing methods.
Contribution
It proposes CoSal, a new context-aware word importance measure that improves unsupervised sentence representations with minimal computation and training.
Findings
Outperforms SkipThought on most benchmarks.
Beats tf-idf on all benchmarks.
Competitive with state-of-the-art unsupervised methods.
Abstract
Unsupervised vector representations of sentences or documents are a major building block for many language tasks such as sentiment classification. However, current methods are uninterpretable and slow or require large training datasets. Recent word vector-based proposals implicitly assume that distances in a word embedding space are equally important, regardless of context. We introduce contextual salience (CoSal), a measure of word importance that uses the distribution of context vectors to normalize distances and weights. CoSal relies on the insight that unusual word vectors disproportionately affect phrase vectors. A bag-of-words model with CoSal-based weights produces accurate unsupervised sentence or document representations for classification, requiring little computation to evaluate and only a single covariance calculation to ``train." CoSal supports small contexts, out-of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques
