Warehousing Web Data
J\'er\^ome Darmont (ERIC), Omar Boussa\"id (ERIC), Fadila Bentayeb, (ERIC)

TL;DR
This paper presents a comprehensive process for integrating web data into data warehouses, emphasizing modeling, metadata, and transformation techniques to facilitate multidimensional analysis.
Contribution
It introduces a modeling process for heterogeneous web data integration into data warehouses using UML, XML schemas, and Java prototypes.
Findings
Effective modeling of multiform web data for warehousing
XML schema-based transformation of heterogeneous data
Prototype implementation demonstrating data integration workflow
Abstract
In a data warehousing process, mastering the data preparation phase allows substantial gains in terms of time and performance when performing multidimensional analysis or using data mining algorithms. Furthermore, a data warehouse can require external data. The web is a prevalent data source in this context. In this paper, we propose a modeling process for integrating diverse and heterogeneous (so-called multiform) data into a unified format. Furthermore, the very schema definition provides first-rate metadata in our data warehousing context. At the conceptual level, a complex object is represented in UML. Our logical model is an XML schema that can be described with a DTD or the XML-Schema language. Eventually, we have designed a Java prototype that transforms our multiform input data into XML documents representing our physical model. Then, the XML documents we obtain are mapped into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies · Data Quality and Management
