Web Table Classification based on Visual Features
Babette B\"uhler, Heiko Paulheim

TL;DR
This paper introduces a novel web table classification method using visual features extracted from table images via CNNs, achieving high accuracy and surpassing existing HTML-based feature methods.
Contribution
The paper presents a CNN-based approach for web table classification that relies solely on visual appearance, eliminating the need for explicit HTML feature engineering.
Findings
ResNet50 achieves 93.29% F1 score with CNN classification.
Combining visual and explicit features yields 93.70% F-measure.
Proposed method outperforms state-of-the-art HTML feature-based techniques.
Abstract
Tables on the web constitute a valuable data source for many applications, like factual search and knowledge base augmentation. However, as genuine tables containing relational knowledge only account for a small proportion of tables on the web, reliable genuine web table classification is a crucial first step of table extraction. Previous works usually rely on explicit feature construction from the HTML code. In contrast, we propose an approach for web table classification by exploiting the full visual appearance of a table, which works purely by applying a convolutional neural network on the rendered image of the web table. Since these visual features can be extracted automatically, our approach circumvents the need for explicit feature construction. A new hand labeled gold standard dataset containing HTML source code and images for 13,112 tables was generated for this task. Transfer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
