LRG at TREC 2020: Document Ranking with XLNet-Based Models
Abheesht Sharma, Harshit Pandey

TL;DR
This paper explores hybrid information retrieval models combining classical IR techniques with transformer-based models like XLNet to improve podcast segment relevance ranking, balancing accuracy and computational efficiency.
Contribution
It introduces a hybrid approach that filters with classical IR and re-ranks with XLNet-based models for better podcast retrieval performance.
Findings
Hybrid models outperform purely classical IR methods.
Re-ranking with XLNet improves relevance accuracy.
Hybrid approach reduces computational costs compared to full neural models.
Abstract
Establishing a good information retrieval system in popular mediums of entertainment is a quickly growing area of investigation for companies and researchers alike. We delve into the domain of information retrieval for podcasts. In Spotify's Podcast Challenge, we are given a user's query with a description to find the most relevant short segment from the given dataset having all the podcasts. Previous techniques that include solely classical Information Retrieval (IR) techniques, perform poorly when descriptive queries are presented. On the other hand, models which exclusively rely on large neural networks tend to perform better. The downside to this technique is that a considerable amount of time and computing power are required to infer the result. We experiment with two hybrid models which first filter out the best podcasts based on user's query with a classical IR technique, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Web Data Mining and Analysis · Topic Modeling
