WSDM Cup 2026 Multilingual Retrieval: A Low-Cost Multi-Stage Retrieval Pipeline

Chentong Hao; Minmao Wang

arXiv:2602.16989·cs.IR·February 20, 2026

WSDM Cup 2026 Multilingual Retrieval: A Low-Cost Multi-Stage Retrieval Pipeline

Chentong Hao, Minmao Wang

PDF

Open Access

TL;DR

This paper introduces a cost-effective, multi-stage multilingual retrieval system for the WSDM Cup 2026, combining query expansion, BM25, dense ranking, and re-ranking to efficiently retrieve relevant news articles across multiple languages.

Contribution

It proposes a novel low-cost retrieval pipeline that integrates LLM-based query expansion with traditional and dense ranking methods for multilingual document retrieval.

Findings

01

Achieved nDCG@20 of 0.403 on the official evaluation

02

High top-20 judged result rate of 0.95

03

Demonstrated effectiveness of each pipeline stage through ablation studies

Abstract

We present a low-cost retrieval system for the WSDM Cup 2026 multilingual retrieval task, where English queries are used to retrieve relevant documents from a collection of approximately ten million news articles in Chinese, Persian, and Russian, and to output the top-1000 ranked results for each query. We follow a four-stage pipeline that combines LLM-based GRF-style query expansion with BM25 candidate retrieval, dense ranking using long-text representations from jina-embeddings-v4, and pointwise re-ranking of the top-20 candidates using Qwen3-Reranker-4B while preserving the dense order for the remaining results. On the official evaluation, the system achieves nDCG@20 of 0.403 and Judged@20 of 0.95. We further conduct extensive ablation experiments to quantify the contribution of each stage and to analyze the effectiveness of query expansion, dense ranking, and top- $k$ reranking under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Biomedical Text Mining and Ontologies