SumRank: Aligning Summarization Models for Long-Document Listwise Reranking

Jincheng Feng; Wenhan Liu; Zhicheng Dou

arXiv:2603.24204·cs.IR·March 26, 2026

SumRank: Aligning Summarization Models for Long-Document Listwise Reranking

Jincheng Feng, Wenhan Liu, Zhicheng Dou

PDF

Open Access

TL;DR

SumRank is a novel summarization model designed to efficiently improve long-document reranking by compressing documents into summaries aligned with ranking objectives, achieving state-of-the-art results.

Contribution

We introduce SumRank, a three-stage training pipeline for a summarization model that enhances long-document reranking efficiency and effectiveness.

Findings

01

Achieves state-of-the-art ranking performance on TREC DL datasets.

02

Reduces summarization overhead and reranking complexity.

03

Improves efficiency without sacrificing accuracy.

Abstract

Large Language Models (LLMs) have demonstrated superior performance in listwise passage reranking task. However, directly applying them to rank long-form documents introduces both effectiveness and efficiency issues due to the substantially increased context length. To address this challenge, we propose a pointwise summarization model SumRank, aligned with downstream listwise reranking, to compress long-form documents into concise rank-aligned summaries before the final listwise reranking stage. To obtain our summarization model SumRank, we introduce a three-stage training pipeline comprising cold-start Supervised Fine-Tuning (SFT), specialized RL data construction, and rank-driven alignment via Reinforcement Learning. This paradigm aligns the SumRank with downstream ranking objectives to preserve relevance signals. We conduct extensive experiments on five benchmark datasets from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Information Retrieval and Search Behavior