Data Mining-based Fragmentation of XML Data Warehouses

Hadj Mahboubi (ERIC); J\'er\^ome Darmont (ERIC)

arXiv:0811.0741·cs.DB·November 6, 2008

Data Mining-based Fragmentation of XML Data Warehouses

Hadj Mahboubi (ERIC), J\'er\^ome Darmont (ERIC)

PDF

Open Access

TL;DR

This paper introduces a k-means-based fragmentation method for XML data warehouses, enabling controlled fragmentation and improved performance over classical algorithms, addressing scalability and response time issues.

Contribution

It adapts k-means clustering for XML warehouse fragmentation, providing better control over fragment number and demonstrating superior efficiency.

Findings

01

k-means-based fragmentation outperforms classical algorithms

02

Controlled number of fragments via k parameter

03

Improved response times and data management

Abstract

With the multiplication of XML data sources, many XML data warehouse models have been proposed to handle data heterogeneity and complexity in a way relational data warehouses fail to achieve. However, XML-native database systems currently suffer from limited performances, both in terms of manageable data volume and response time. Fragmentation helps address both these issues. Derived horizontal fragmentation is typically used in relational data warehouses and can definitely be adapted to the XML context. However, the number of fragments produced by classical algorithms is difficult to control. In this paper, we propose the use of a k-means-based fragmentation approach that allows to master the number of fragments through its $k$ parameter. We experimentally compare its efficiency to classical derived horizontal fragmentation algorithms adapted to XML data warehouses and show its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Data Mining Algorithms and Applications