Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing

Juntai Cao; Xiang Zhang; Raymond Li; Chuyuan Li; Chenyu You; Shafiq Joty; Giuseppe Carenini

arXiv:2502.20592·cs.CL·May 21, 2025

Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing

Juntai Cao, Xiang Zhang, Raymond Li, Chuyuan Li, Chenyu You, Shafiq Joty, Giuseppe Carenini

PDF

1 Video

TL;DR

This paper introduces Multi2, a scalable framework for multi-document summarization using test-time prompt ensemble and novel LLM-based metrics, improving summary quality and understanding scaling limits.

Contribution

It presents a new test-time scaling approach for MDS with prompt ensembles and introduces two metrics for better evaluation of summaries.

Findings

01

Enhanced summary quality through prompt ensemble methods.

02

New metrics (CAP and LLM-ACU) effectively evaluate summary consistency.

03

Identified practical scaling boundaries for multi-document summarization.

Abstract

Recent advances in test-time scaling have shown promising results in improving Large Language Model (LLM) performance through strategic computation allocation during inference. While this approach has demonstrated strong improvements in logical and mathematical reasoning tasks, its application to natural language generation (NLG), particularly summarization, remains unexplored. Multi-Document Summarization (MDS), a fundamental task in NLG, presents unique challenges by requiring models to extract and synthesize essential information across multiple lengthy documents. Unlike reasoning tasks, MDS demands a more nuanced approach to prompt design and ensemble methods, as no single "best" prompt can satisfy diverse summarization requirements. We propose a novel framework leveraging test-time scaling for MDS. Our approach employs prompt ensemble techniques to generate multiple candidate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing· underline