Pooling Hybrid Representations for Web Structured Data Annotation
Luciano Barbosa, Breno W. Carvalho, Bianca Zadrozny

TL;DR
This paper introduces a hybrid deep learning model combining CNN and RNN to automatically classify web structured data attributes based on their values, outperforming previous methods without pre-processing.
Contribution
The novel hybrid deep learning network effectively captures sequence features of attribute values for data annotation without pre-processing or handcrafted features.
Findings
Outperforms previous approaches in four web domains
Effectively captures short- and long-distance dependencies in data sequences
Eliminates need for pre-processing or handcrafted features
Abstract
Automatically identifying data types of web structured data is a key step in the process of web data integration. Web structured data is usually associated with entities or objects in a particular domain. In this paper, we aim to map attributes of an entity in a given domain to pre-specified classes of attributes in the same domain based on their values. To perform this task, we propose a hybrid deep learning network that relies on the format of the attributes' values. It does so without any pre-processing or using pre-defined hand-crafted features. The hybrid network combines sequence-based neural networks, namely convolutional neural networks (CNN) and recurrent neural networks (RNN), to learn the sequence structure of attributes' values. The CNN captures short-distance dependencies in these sequences through a sliding window approach, and the RNN captures long-distance dependencies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Semantic Web and Ontologies · Data Quality and Management
