Learning Probabilistic Sentence Representations from Paraphrases
Mingda Chen, Kevin Gimpel

TL;DR
This paper introduces probabilistic models for sentence representations trained on paraphrases, capturing sentence specificity and entailment, with the best model using linear transformations on Gaussian distributions.
Contribution
It proposes the first probabilistic sentence embedding models trained on paraphrases, capturing notions of specificity and entailment.
Findings
Best model uses linear transformations on Gaussian distributions.
Probabilistic models capture sentence entailment and specificity.
Simpler models also effectively represent specificity via vector norms.
Abstract
Probabilistic word embeddings have shown effectiveness in capturing notions of generality and entailment, but there is very little work on doing the analogous type of investigation for sentences. In this paper we define probabilistic models that produce distributions for sentences. Our best-performing model treats each word as a linear transformation operator applied to a multivariate Gaussian distribution. We train our models on paraphrases and demonstrate that they naturally capture sentence specificity. While our proposed model achieves the best performance overall, we also show that specificity is represented by simpler architectures via the norm of the sentence vectors. Qualitative analysis shows that our probabilistic model captures sentential entailment and provides ways to analyze the specificity and preciseness of individual words.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
