Unsupervised Sentence Representations as Word Information Series: Revisiting TF--IDF
Ignacio Arroyo-Fern\'andez, Carlos-Francisco M\'endez-Cruz, Gerardo, Sierra, Juan-Manuel Torres-Moreno, Grigori Sidorov

TL;DR
This paper introduces an unsupervised sentence embedding method that models sentences as weighted word embedding series using TF-IDF, achieving state-of-the-art results on semantic similarity benchmarks.
Contribution
The paper proposes a novel unsupervised sentence representation technique based on TF-IDF weighted word embeddings, with advantages like quick training and independence from external resources.
Findings
Outperformed existing methods on Semantic Textual Similarity benchmarks.
Achieved state-of-the-art performance compared to supervised and knowledge-based systems.
Model is adaptable to different data properties and languages.
Abstract
Sentence representation at the semantic level is a challenging task for Natural Language Processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is an open question due to complexities of semantic interactions among words. In this paper, we present an embedding method, which is aimed at learning unsupervised sentence representations from unlabeled text. We propose an unsupervised method that models a sentence as a weighted series of word embeddings. The weights of the word embeddings are fitted by using Shannon's word entropies provided by the Term Frequency--Inverse Document Frequency (TF--IDF) transform. The hyperparameters of the model can be selected according to the properties of data (e.g. sentence length and textual gender). Hyperparameter selection involves word embedding methods and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
