Improving Neural Ranking Models with Traditional IR Methods
Anik Saha, Oktie Hassanzadeh, Alex Gittens, Jian Ni, Kavitha Srinivas,, Bulent Yener

TL;DR
This paper demonstrates that combining traditional IR methods like TF-IDF with shallow embedding models offers a cost-effective alternative to large transformer models for document retrieval, maintaining competitive performance.
Contribution
It introduces a simple hybrid approach that combines TF-IDF with shallow embeddings, improving retrieval performance and reducing resource requirements.
Findings
Hybrid model competes with large transformer models.
Adding TF-IDF enhances neural model performance.
Low-resource method achieves strong results on multiple datasets.
Abstract
Neural ranking methods based on large transformer models have recently gained significant attention in the information retrieval community, and have been adopted by major commercial solutions. Nevertheless, they are computationally expensive to create, and require a great deal of labeled data for specialized corpora. In this paper, we explore a low resource alternative which is a bag-of-embedding model for document retrieval and find that it is competitive with large transformer models fine tuned on information retrieval tasks. Our results show that a simple combination of TF-IDF, a traditional keyword matching method, with a shallow embedding model provides a low cost path to compete well with the performance of complex neural ranking models on 3 datasets. Furthermore, adding TF-IDF measures improves the performance of large-scale fine tuned models on these tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Advanced Graph Neural Networks
