New/s/leak 2.0 - Multilingual Information Extraction and Visualization for Investigative Journalism
Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann

TL;DR
new/s/leak 2.0 is an open-source tool that enhances multilingual investigative journalism by enabling automatic language detection, language-specific information extraction, visualization, and decentralized analysis of large, confidential text datasets.
Contribution
It introduces three novel features: multilingual support with 40 languages, entity and keyword visualization, and decentralized deployment for confidential data analysis.
Findings
Supports 40 languages for extraction and analysis
Enables visualization of entities and keywords
Facilitates decentralized analysis of confidential data
Abstract
Investigative journalism in recent years is confronted with two major challenges: 1) vast amounts of unstructured data originating from large text collections such as leaks or answers to Freedom of Information requests, and 2) multi-lingual data due to intensified global cooperation and communication in politics, business and civil society. Faced with these challenges, journalists are increasingly cooperating in international networks. To support such collaborations, we present the new version of new/s/leak 2.0, our open-source software for content-based searching of leaks. It includes three novel main features: 1) automatic language detection and language-dependent information extraction for 40 languages, 2) entity and keyword visualization for efficient exploration, and 3) decentral deployment for analysis of confidential data from various formats. We illustrate the new analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Web Data Mining and Analysis · Advanced Text Analysis Techniques
