Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding

Keliang Liu; Zizhi Chen; Mingcheng Li; Jingqun Tang; Dingkang Yang; Lihua Zhang

arXiv:2511.22850·cs.CV·December 1, 2025

Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding

Keliang Liu, Zizhi Chen, Mingcheng Li, Jingqun Tang, Dingkang Yang, Lihua Zhang

PDF

Open Access

TL;DR

SLEUTH is a multi-agent framework that enhances long-document understanding by selectively extracting and synthesizing key multimodal clues, significantly improving performance on benchmark tasks.

Contribution

It introduces a scalable, model-agnostic hierarchical approach that orchestrates multiple agents to refine and distill evidence from long documents for better comprehension.

Findings

01

Achieves state-of-the-art results on multiple benchmarks

02

Effectively filters relevant visual and textual clues

03

Hierarchical refinement improves understanding accuracy

Abstract

Document understanding is a long standing practical task. Vision Language Models (VLMs) have gradually become a primary approach in this domain, demonstrating effective performance on single page tasks. However, their effectiveness diminishes when handling long documents. In such scenarios, clues are often scattered across multiple pages and modalities, and redundancy from lengthy inputs can impair the models judgment. While retrieval augmented generation mitigates this issue by filtering for question relevant content, the retrieved results still contain substantial redundancy. To address these limitations, we propose SLEUTH, a multi agent framework. Concretely, SLEUTH orchestrates a retriever and four collaborative agents in a coarse to fine process. The framework identifies key textual and visual clues within the retrieved pages, filters for salient visual evidence such as tables and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Information Retrieval and Search Behavior