Graph integration of structured, semistructured and unstructured data for data journalism
Oana Balalau (CEDAR), Catarina Concei\c{c}{\~a}o (INESC-ID, IST),, Helena Galhardas (INESC-ID, IST), Ioana Manolescu (CEDAR), Tayeb Merabti, (CEDAR), Jingmao You (CEDAR, IP Paris), Youssr Youssef (CEDAR, ENSAE, IP, Paris)

TL;DR
This paper presents a comprehensive system for integrating diverse data sources like structured, semi-structured, graph, and text data to support data journalism, enabling non-expert users to analyze heterogeneous datasets effectively.
Contribution
It introduces a complete approach and system, ConnectionLens, for scalable integration of heterogeneous data sources tailored for data journalism applications.
Findings
Successful implementation within the ConnectionLens system
Effective handling of dynamic and diverse data sources
Validated through experimental results
Abstract
Nowadays, journalism is facilitated by the existence of large amounts of digital data sources, including many Open Data ones. Such data sources are extremely heterogeneous, ranging from highly struc-tured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to de ne and deploy custom extract-transform-load work ows. These are di cult to set up not only for arbitrary heterogeneous inputs , but also given that users may want to add (or remove) datasets to (from) the corpus. We describe a complete approach for integrating dynamic sets of heterogeneous data sources along the lines described above: the challenges we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Web Data Mining and Analysis · Advanced Database Systems and Queries
