TableZa -- A classical Computer Vision approach to Tabular Extraction

Saumya Banthia; Anantha Sharma; Ravi Mangipudi

arXiv:2105.09137·cs.CL·May 20, 2021

TableZa -- A classical Computer Vision approach to Tabular Extraction

Saumya Banthia, Anantha Sharma, Ravi Mangipudi

PDF

Open Access

TL;DR

This paper presents a classical computer vision method called TableZa for extracting tabular data from images or PDFs, addressing the challenges of spectral and spatial data validation in document comprehension.

Contribution

It introduces a novel computer vision-based approach tailored for diverse tabular formats in documents, enhancing extraction accuracy.

Findings

01

Effective extraction of tabular data from images and PDFs.

02

Addresses spectral and spatial sanity in data extraction.

03

Applicable to various tabular formats.

Abstract

Computer aided Tabular Data Extraction has always been a very challenging and error prone task because it demands both Spectral and Spatial Sanity of data. In this paper we discuss an approach for Tabular Data Extraction in the realm of document comprehension. Given the different kinds of the Tabular formats that are often found across various documents, we discuss a novel approach using Computer Vision for extraction of tabular data from images or vector pdf(s) converted to image(s).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Currency Recognition and Detection