Beyond Questions: Leveraging ColBERT for Keyphrase Search
Jorge Gab\'in, Javier Parapar, Craig Macdonald

TL;DR
This paper enhances document ranking for keyphrase queries by adapting the ColBERT architecture, utilizing large language models to convert question-like queries into keyphrases, and training specialized encoders to improve retrieval performance.
Contribution
It introduces a novel keyphrase-based ColBERT model and explores cost-effective training methods for improved keyphrase search performance.
Findings
Late interaction architecture benefits keyphrase search
Language models can effectively convert questions to keyphrases
Training only the query encoder is feasible and efficient
Abstract
While question-like queries are gaining popularity and search engines' users increasingly adopt them, keyphrase search has traditionally been the cornerstone of web search. This query type is also prevalent in specialised search tasks such as academic or professional search, where experts rely on keyphrases to articulate their information needs. However, current dense retrieval models often fail with keyphrase-like queries, primarily because they are mostly trained on question-like ones. This paper introduces a novel model that employs the ColBERT architecture to enhance document ranking for keyphrase queries. For that, given the lack of large keyphrase-based retrieval datasets, we first explore how Large Language Models can convert question-like queries into keyphrase format. Then, using those keyphrases, we train a keyphrase-based ColBERT ranker (ColBERTKP_QD) to improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
MethodsADaptive gradient method with the OPTimal convergence rate
