Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers

Zhiyuan Peng; Ting-ruen Wei; Tingyu Song; Yilun Zhao

arXiv:2507.06223·cs.CL·October 10, 2025

Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers

Zhiyuan Peng, Ting-ruen Wei, Tingyu Song, Yilun Zhao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces FLOPs-based metrics RPP and QPP for evaluating LLM rerankers, providing a hardware-independent way to measure efficiency and effectiveness trade-offs in information retrieval tasks.

Contribution

It proposes new FLOPs-based metrics and an estimator to better evaluate LLM rerankers' efficiency-effectiveness tradeoff independent of hardware and implementation details.

Findings

01

RPP and QPP offer consistent efficiency measurements across different hardware.

02

The FLOPs estimator accurately predicts model FLOPs without running experiments.

03

Comprehensive experiments reveal insights into the efficiency-effectiveness tradeoff.

Abstract

Large Language Models (LLMs) have recently been applied to reranking tasks in information retrieval, achieving strong performance. However, their high computational demands often hinder practical deployment. Existing studies evaluate the efficiency of LLM-based rerankers using proxy metrics such as latency, the number of forward passes, input tokens, and output tokens. However, these metrics depend on hardware and running-time choices (\eg parallel or not, batch size, etc), and often fail to account for model size, making it difficult to interpret and obscuring the evaluation of the efficiency-effectiveness tradeoff. To address this issue, we propose \ours\footnote{https://github.com/zhiyuanpeng/EER-FLOPs.} for LLM-based rerankers: RPP (ranking metrics per PetaFLOP), measuring how much ranking quality (e.g., NDCG or MRR) a method achieves per PetaFLOP, and QPP (queries per PetaFLOP),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhiyuanpeng/eer-flops
pytorchOfficial

Videos

Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers· underline

Taxonomy

TopicsInformation Retrieval and Search Behavior · Natural Language Processing Techniques · Topic Modeling