Improving Neural Ranking Models with Traditional IR Methods

Anik Saha; Oktie Hassanzadeh; Alex Gittens; Jian Ni; Kavitha Srinivas,; Bulent Yener

arXiv:2308.15027·cs.IR·August 30, 2023

Improving Neural Ranking Models with Traditional IR Methods

Anik Saha, Oktie Hassanzadeh, Alex Gittens, Jian Ni, Kavitha Srinivas,, Bulent Yener

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that combining traditional IR methods like TF-IDF with shallow embedding models offers a cost-effective alternative to large transformer models for document retrieval, maintaining competitive performance.

Contribution

It introduces a simple hybrid approach that combines TF-IDF with shallow embeddings, improving retrieval performance and reducing resource requirements.

Findings

01

Hybrid model competes with large transformer models.

02

Adding TF-IDF enhances neural model performance.

03

Low-resource method achieves strong results on multiple datasets.

Abstract

Neural ranking methods based on large transformer models have recently gained significant attention in the information retrieval community, and have been adopted by major commercial solutions. Nevertheless, they are computationally expensive to create, and require a great deal of labeled data for specialized corpora. In this paper, we explore a low resource alternative which is a bag-of-embedding model for document retrieval and find that it is competitive with large transformer models fine tuned on information retrieval tasks. Our results show that a simple combination of TF-IDF, a traditional keyword matching method, with a shallow embedding model provides a low cost path to compete well with the performance of complex neural ranking models on 3 datasets. Furthermore, adding TF-IDF measures improves the performance of large-scale fine tuned models on these tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aniksh/dual_encoder
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Topic Modeling · Advanced Graph Neural Networks