Can Cross Encoders Produce Useful Sentence Embeddings?
Haritha Ananthakrishnan, Julian Dolby, Harsha Kokel, Horst Samulowitz,, Kavitha Srinivas

TL;DR
This paper explores the potential of cross encoders to produce useful sentence embeddings for information retrieval, demonstrating that earlier layer embeddings can be effective and enabling a faster, distilled dual encoder approach.
Contribution
It reveals that earlier layer embeddings of cross encoders can be used for retrieval and introduces a method to distill a lightweight dual encoder from cross encoder representations.
Findings
Earlier layer embeddings of CEs are effective for retrieval.
Distilled DE achieves 5.15x faster inference.
CE embeddings can be repurposed beyond re-ranking.
Abstract
Cross encoders (CEs) are trained with sentence pairs to detect relatedness. As CEs require sentence pairs at inference, the prevailing view is that they can only be used as re-rankers in information retrieval pipelines. Dual encoders (DEs) are instead used to embed sentences, where sentence pairs are encoded by two separate encoders with shared weights at training, and a loss function that ensures the pair's embeddings lie close in vector space if the sentences are related. DEs however, require much larger datasets to train, and are less accurate than CEs. We report a curious finding that embeddings from earlier layers of CEs can in fact be used within an information retrieval pipeline. We show how to exploit CEs to distill a lighter-weight DE, with a 5.15x speedup in inference time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection
