Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence
Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

TL;DR
This paper introduces a Conformer-enhanced Transformer-Kernel model with query term independence for improved document ranking, achieving better results on TREC benchmarks while maintaining efficiency.
Contribution
It proposes a novel Conformer layer and incorporates query term independence to scale Transformer-Kernel models for longer inputs and full retrieval tasks.
Findings
Outperforms TKL in retrieval quality
Beats all non-neural baselines on NDCG@10
Surpasses two-thirds of pretrained Transformer models
Abstract
The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark -- and can be considered to be an efficient (but slightly less effective) alternative to other Transformer-based architectures that employ (i) large-scale pretraining (high training cost), (ii) joint encoding of query and document (high inference cost), and (iii) larger number of Transformer layers (both high training and high inference costs). Since, a variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences in the context of document ranking. In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences. Furthermore, we incorporate query term independence and explicit term matching to extend the model to the full retrieval setting.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Layer Normalization · Label Smoothing · Residual Connection · Byte Pair Encoding
