Beyond Retrieval: A Multitask Benchmark and Model for Code Search

Siqiao Xue; Zihan Liao; Jin Qin; Ziyin Zhang; Yixiang Mu; Fan Zhou; Hang Yu

arXiv:2605.04615·cs.SE·May 11, 2026

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

Siqiao Xue, Zihan Liao, Jin Qin, Ziyin Zhang, Yixiang Mu, Fan Zhou, Hang Yu

PDF

1 Repo

TL;DR

This paper introduces extsc{CoREB}, a comprehensive code search benchmark and a fine-tuned reranker, addressing limitations of existing datasets and evaluating models across multiple tasks and programming languages.

Contribution

It presents a contamination-limited, multitask benchmark with a fine-tuned reranker that improves the full code search pipeline beyond retrieval.

Findings

01

Code-specialized embeddings outperform general encoders in code-to-code retrieval.

02

Short keyword queries significantly reduce model effectiveness.

03

Fine-tuned extsc{CoREB-Reranker} achieves consistent improvements across tasks.

Abstract

Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this paper, we introduce \textsc{CoREB}, a contamination-limited, multitask \underline{co}de \underline{r}etrieval and r\underline{e}ranking \underline{b}enchmark, together with a fine-tuned code reranker, that goes beyond retrieval to cover the full code search pipeline. \textsc{CoREB} is built from counterfactually rewritten LiveCodeBench problems in five programming languages and delivered as timed releases with graded relevance judgments. We benchmark eleven embedding models and five rerankers across three tasks: text-to-code, code-to-text, and code-to-code. Our experiments reveal that: \circone code-specialised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hq-bench/coreb
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.