Client-Driven Content Extraction Associated with Table

K.C. Santosh (LORIA); Abdel Bela\"id (LORIA)

arXiv:1304.1930·cs.CV·April 9, 2013

Client-Driven Content Extraction Associated with Table

K.C. Santosh (LORIA), Abdel Bela\"id (LORIA)

PDF

Open Access

TL;DR

This paper presents a client-driven method for extracting table content from document images by representing key fields as attributed relational graphs and mining similar graphs to identify table items.

Contribution

It introduces a novel approach using attributed relational graphs to extract table content based on client-identified key fields, validated on real-world industrial data.

Findings

01

Effective extraction of table content demonstrated on real-world data

02

Graph-based method accurately identifies table items

03

Approach tailored to client-specified key fields

Abstract

The goal of the project is to extract content within table in document images based on learnt patterns. Real-world users i.e., clients first provide a set of key fields within the table which they think are important. These are first used to represent the graph where nodes are labelled with semantics including other features and edges are attributed with relations. Attributed relational graph (ARG) is then employed to mine similar graphs from a document image. Each mined graph will represent an item within the table, and hence a set of such graphs will compose a table. We have validated the concept by using a real-world industrial problem.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Text and Document Classification Technologies