Transformer Based Language Models for Similar Text Retrieval and Ranking
Javed Qadrud-Din, Ashraf Bah Rabiou, Ryan Walker, Ravi Soni, Martin, Gajek, Gabriel Pack, Akhil Rangaraj

TL;DR
This paper introduces transformer-based methods for similar text retrieval and ranking that do not rely on initial bag-of-words approaches, enabling accurate results even with no shared non-stopwords.
Contribution
The paper presents novel transformer-based techniques that directly retrieve and rank similar texts without initial bag-of-words filtering, improving accuracy in cases with no common words.
Findings
Effective retrieval without bag-of-words step
Supervised and unsupervised BERT-based methods demonstrated
Improved accuracy in text ranking tasks
Abstract
Most approaches for similar text retrieval and ranking with long natural language queries rely at some level on queries and responses having words in common with each other. Recent applications of transformer-based neural language models to text retrieval and ranking problems have been very promising, but still involve a two-step process in which result candidates are first obtained through bag-of-words-based approaches, and then reranked by a neural transformer. In this paper, we introduce novel approaches for effectively applying neural transformer models to similar text retrieval and ranking without an initial bag-of-words-based step. By eliminating the bag-of-words-based step, our approach is able to accurately retrieve and rank results even when they have no non-stopwords in common with the query. We accomplish this by using bidirectional encoder representations from transformers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections
