XML content warehousing: Improving sociological studies of mailing lists and web data
Benjamin Nguyen (PRISM, INRIA Rocquencourt), Antoine Vion (LEST),, Fran\c{c}ois-Xavier Dudouet (IRISES), Dario Colazzo (LRI, INRIA Saclay - Ile, de France), Ioana Manolescu (LRI, INRIA Saclay - Ile de France), Pierre, Senellart

TL;DR
This paper introduces an XML-based data warehousing approach for sociological analysis of web data like mailing lists, enabling flexible, comprehensive, and evolvable data storage and analysis.
Contribution
It proposes an XML schema for mailing list data warehousing, demonstrating its advantages over traditional methods and providing a practical implementation and case study.
Findings
XML warehousing supports structural evolution and complex queries.
It simplifies handling hidden or complex data sources.
Exporting to sociological analysis software is feasible.
Abstract
In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Web Data Mining and Analysis · Peer-to-Peer Network Technologies
