Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach
Hyungsul Kim, Ahmed El-Kishky, Xiang Ren, Jiawei Han

TL;DR
ProxiModel is a new framework for extracting structured news events from large, noisy news data by modeling event correlations within and across documents using a proximity-network, enabling scalable and interpretable event mining.
Contribution
It introduces the proximity-network data structure and a probabilistic modeling approach for scalable, high-quality news event extraction from large corpora.
Findings
Effective in generating high-quality event descriptors and attributes
Robust across different news corpora
Applicable to news summarization and event tracking
Abstract
We present ProxiModel, a novel event mining framework for extracting high-quality structured event knowledge from large, redundant, and noisy news data sources. The proposed model differentiates itself from other approaches by modeling both the event correlation within each individual document as well as across the corpus. To facilitate this, we introduce the concept of a proximity-network, a novel space-efficient data structure to facilitate scalable event mining. This proximity network captures the corpus-level co-occurence statistics for candidate event descriptors, event attributes, as well as their connections. We probabilistically model the proximity network as a generative process with sparsity-inducing regularization. This allows us to efficiently and effectively extract high-quality and interpretable news events. Experiments on three different news corpora demonstrate that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Web Data Mining and Analysis
