SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation
Nobuhiro Ueda, Yuyang Dong, Kriszti\'an Boros, Daiki Ito, Takuya Sera, Masafumi Oyamada

TL;DR
SCAN is a novel semantic document layout analysis method that improves retrieval-augmented generation systems by efficiently identifying meaningful document regions, enhancing performance on rich visual documents.
Contribution
The paper introduces SCAN, a VLM-friendly approach that segments documents into semantically coherent regions, improving RAG performance on visually rich documents.
Findings
Improves textual RAG performance by up to 9.4 points
Enhances visual RAG performance by up to 10.4 points
Outperforms conventional and commercial document processing methods
Abstract
With the increasing adoption of Large Language Models (LLMs) and Vision-Language Models (VLMs), rich document analysis technologies for applications like Retrieval-Augmented Generation (RAG) and visual RAG are gaining significant attention. Recent research indicates that using VLMs yields better RAG performance, but processing rich documents remains a challenge since a single page contains large amounts of information. In this paper, we present SCAN (SemantiC Document Layout ANalysis), a novel approach that enhances both textual and visual Retrieval-Augmented Generation (RAG) systems that work with visually rich documents. It is a VLM-friendly approach that identifies document components with appropriate semantic granularity, balancing context preservation with processing efficiency. SCAN uses a coarse-grained semantic approach that divides documents into coherent regions covering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Handwritten Text Recognition Techniques · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Softmax · Attention Dropout · WordPiece · Linear Layer · Residual Connection · Byte Pair Encoding · Weight Decay
