Pooling Hybrid Representations for Web Structured Data Annotation

Luciano Barbosa; Breno W. Carvalho; Bianca Zadrozny

arXiv:1610.00493·cs.DB·October 4, 2016

Pooling Hybrid Representations for Web Structured Data Annotation

Luciano Barbosa, Breno W. Carvalho, Bianca Zadrozny

PDF

Open Access

TL;DR

This paper introduces a hybrid deep learning model combining CNN and RNN to automatically classify web structured data attributes based on their values, outperforming previous methods without pre-processing.

Contribution

The novel hybrid deep learning network effectively captures sequence features of attribute values for data annotation without pre-processing or handcrafted features.

Findings

01

Outperforms previous approaches in four web domains

02

Effectively captures short- and long-distance dependencies in data sequences

03

Eliminates need for pre-processing or handcrafted features

Abstract

Automatically identifying data types of web structured data is a key step in the process of web data integration. Web structured data is usually associated with entities or objects in a particular domain. In this paper, we aim to map attributes of an entity in a given domain to pre-specified classes of attributes in the same domain based on their values. To perform this task, we propose a hybrid deep learning network that relies on the format of the attributes' values. It does so without any pre-processing or using pre-defined hand-crafted features. The hybrid network combines sequence-based neural networks, namely convolutional neural networks (CNN) and recurrent neural networks (RNN), to learn the sequence structure of attributes' values. The CNN captures short-distance dependencies in these sequences through a sliding window approach, and the RNN captures long-distance dependencies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Semantic Web and Ontologies · Data Quality and Management