TL;DR
This paper systematically compares methods for learning unsupervised distributed sentence representations, highlighting the importance of model complexity based on application and introducing new objectives for improved learning efficiency.
Contribution
It provides a comprehensive comparison of sentence embedding models and proposes two novel unsupervised learning objectives to optimize training time and performance.
Findings
Deeper models suit supervised tasks better.
Shallow models excel in simple spatial decoding.
New objectives improve training efficiency and domain portability.
Abstract
Unsupervised methods for learning distributed representations of words are ubiquitous in today's NLP research, but far less is known about the best ways to learn distributed phrase or sentence representations from unlabelled data. This paper is a systematic comparison of models that learn such representations. We find that the optimal approach depends critically on the intended application. Deeper, more complex models are preferable for representations to be used in supervised systems, but shallow log-linear models work best for building representation spaces that can be decoded with simple spatial distance metrics. We also propose two new unsupervised representation-learning objectives designed to optimise the trade-off between training time, domain portability and performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
