Empowering Investigative Journalism with Graph-based Heterogeneous Data Management
Angelos-Christos Anadiotis, Oana Balalau, Theo Bouganim, Francesco, Chimienti, Helena Galhardas, Mhd Yamen Haddad, Stephane Horel, Ioana, Manolescu, Youssr Youssef

TL;DR
This paper presents ConnectionLens, a system that integrates heterogeneous data sources into a graph for investigative journalism, introducing scalable techniques for data processing and fast keyword search to support complex queries.
Contribution
It introduces novel scalable methods for constructing and querying heterogeneous data graphs, enhancing investigative journalism applications.
Findings
Reduced Information Extraction costs during graph construction.
Achieved significant speed-up in keyword search performance.
Validated effectiveness on real-world investigative journalism data.
Abstract
Investigative Journalism (IJ, in short) is staple of modern, democratic societies. IJ often necessitates working with large, dynamic sets of heterogeneous, schema-less data sources, which can be structured, semi-structured, or textual, limiting the applicability of classical data integration approaches. In prior work, we have developed ConnectionLens, a system capable of integrating such sources into a single heterogeneous graph, leveraging Information Extraction (IE) techniques; users can then query the graph by means of keywords, and explore query results and their neighborhood using an interactive GUI. Our keyword search problem is complicated by the graph heterogeneity, and by the lack of a result score function that would allow to prune some of the search space. In this work, we describe an actual IJ application studying conflicts of interest in the biomedical domain, and we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Scientific Computing and Data Management · Complex Network Analysis Techniques
