Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding
Keliang Liu, Zizhi Chen, Mingcheng Li, Jingqun Tang, Dingkang Yang, Lihua Zhang

TL;DR
SLEUTH is a multi-agent framework that enhances long-document understanding by selectively extracting and synthesizing key multimodal clues, significantly improving performance on benchmark tasks.
Contribution
It introduces a scalable, model-agnostic hierarchical approach that orchestrates multiple agents to refine and distill evidence from long documents for better comprehension.
Findings
Achieves state-of-the-art results on multiple benchmarks
Effectively filters relevant visual and textual clues
Hierarchical refinement improves understanding accuracy
Abstract
Document understanding is a long standing practical task. Vision Language Models (VLMs) have gradually become a primary approach in this domain, demonstrating effective performance on single page tasks. However, their effectiveness diminishes when handling long documents. In such scenarios, clues are often scattered across multiple pages and modalities, and redundancy from lengthy inputs can impair the models judgment. While retrieval augmented generation mitigates this issue by filtering for question relevant content, the retrieved results still contain substantial redundancy. To address these limitations, we propose SLEUTH, a multi agent framework. Concretely, SLEUTH orchestrates a retriever and four collaborative agents in a coarse to fine process. The framework identifies key textual and visual clues within the retrieved pages, filters for salient visual evidence such as tables and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Information Retrieval and Search Behavior
