Towards Universal Paraphrastic Sentence Embeddings
John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

TL;DR
This paper evaluates various architectures for learning universal paraphrastic sentence embeddings, finding simple averaging models effective across domains and demonstrating their utility in multiple NLP tasks, with resources released for community use.
Contribution
It compares six compositional architectures for sentence embeddings, highlighting the efficiency and effectiveness of simple averaging models across diverse NLP tasks and domains.
Findings
Simple averaging models outperform LSTMs out-of-domain.
LSTMs achieve state-of-the-art on sentiment classification.
Pretrained embeddings improve performance on similarity and entailment.
Abstract
We consider the problem of learning general-purpose, paraphrastic sentence embeddings based on supervision from the Paraphrase Database (Ganitkevitch et al., 2013). We compare six compositional architectures, evaluating them on annotated textual similarity datasets drawn both from the same distribution as the training data and from a wide range of other domains. We find that the most complex architectures, such as long short-term memory (LSTM) recurrent neural networks, perform best on the in-domain data. However, in out-of-domain scenarios, simple architectures such as word averaging vastly outperform LSTMs. Our simplest averaging model is even competitive with systems tuned for the particular tasks while also being extremely efficient and easy to use. In order to better understand how these architectures compare, we conduct further experiments on three supervised NLP tasks: sentence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
