Client-Driven Content Extraction Associated with Table
K.C. Santosh (LORIA), Abdel Bela\"id (LORIA)

TL;DR
This paper presents a client-driven method for extracting table content from document images by representing key fields as attributed relational graphs and mining similar graphs to identify table items.
Contribution
It introduces a novel approach using attributed relational graphs to extract table content based on client-identified key fields, validated on real-world industrial data.
Findings
Effective extraction of table content demonstrated on real-world data
Graph-based method accurately identifies table items
Approach tailored to client-specified key fields
Abstract
The goal of the project is to extract content within table in document images based on learnt patterns. Real-world users i.e., clients first provide a set of key fields within the table which they think are important. These are first used to represent the graph where nodes are labelled with semantics including other features and edges are attributed with relations. Attributed relational graph (ARG) is then employed to mine similar graphs from a document image. Each mined graph will represent an item within the table, and hence a set of such graphs will compose a table. We have validated the concept by using a real-world industrial problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Text and Document Classification Technologies
