Latte-Mix: Measuring Sentence Semantic Similarity with Latent Categorical Mixtures
M. Li, H. Bai, L. Tan, K. Xiong, M. Li, J. Lin

TL;DR
Latte-Mix introduces a Bayesian latent categorical mixture model for sentence semantic similarity, outperforming traditional mean pooling methods and enhancing zero-shot performance of pre-trained language models.
Contribution
The paper proposes a novel latent categorical mixture approach, Latte-Mix, with theoretical and empirical validation for improved sentence similarity measurement.
Findings
State-of-the-art zero-shot performance on STS datasets
Latte-Mix improves finetuned models' accuracy
Method is fast and memory-efficient
Abstract
Measuring sentence semantic similarity using pre-trained language models such as BERT generally yields unsatisfactory zero-shot performance, and one main reason is ineffective token aggregation methods such as mean pooling. In this paper, we demonstrate under a Bayesian framework that distance between primitive statistics such as the mean of word embeddings are fundamentally flawed for capturing sentence-level semantic similarity. To remedy this issue, we propose to learn a categorical variational autoencoder (VAE) based on off-the-shelf pre-trained language models. We theoretically prove that measuring the distance between the latent categorical mixtures, namely Latte-Mix, can better reflect the true sentence semantic similarity. In addition, our Bayesian framework provides explanations for why models finetuned on labelled sentence pairs have better zero-shot performance. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Adam · Layer Normalization · Dense Connections · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay
