BRIGHT+: Upgrading the BRIGHT Benchmark with MARCUS, a Multi-Agent RAG Clean-Up Suite

Liyang Chen; Yujun Cai; Jieqiong Dong; Yiwei Wang

arXiv:2506.07116·cs.AI·June 10, 2025

BRIGHT+: Upgrading the BRIGHT Benchmark with MARCUS, a Multi-Agent RAG Clean-Up Suite

Liyang Chen, Yujun Cai, Jieqiong Dong, Yiwei Wang

PDF

Open Access 1 Datasets

TL;DR

This paper introduces MARCUS, a multi-agent system that enhances the BRIGHT benchmark by cleaning and restructuring its corpus, leading to better retrieval accuracy and reasoning performance in RAG systems.

Contribution

The paper presents MARCUS, a novel multi-agent pipeline that systematically improves the quality of the BRIGHT benchmark corpus, addressing web-crawled artifacts and enhancing its utility for complex retrieval tasks.

Findings

01

BRIGHT-Plus improves retrieval accuracy across multiple retrievers.

02

Enhanced corpus leads to better multi-hop reasoning performance.

03

MARCUS effectively removes structural noise and semantic discontinuities.

Abstract

Retrieval-Augmented Generation (RAG) systems require corpora that are both structurally clean and semantically coherent. BRIGHT is a recent and influential benchmark designed to evaluate complex multi-hop retrieval across diverse, high-reasoning domains. However, its practical effectiveness is limited by common web-crawled artifacts - such as content redundancy and semantic discontinuity - that impair retrieval accuracy and downstream reasoning. Notably, we find that such issues are concentrated in seven StackExchange-derived subdomains, while other domains (e.g., Coding and Theorem-based content) remain relatively clean. In this study, we present MARCUS, a multi-agent pipeline that leverages large language models (LLMs) to systematically clean and re-chunk BRIGHT into a higher-quality corpus: BRIGHT-Plus. MARCUS applies dedicated agents for structural noise removal and semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Helios1208/BRIGHT-Plus
dataset· 47 dl
47 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Multimodal Machine Learning Applications