Le Processus Powered Dirichlet-Hawkes comme A Priori Flexible pour Clustering Temporel de Textes
Ga\"el Poux-M\'edard, Julien Velcin, Sabine Loudcher

TL;DR
This paper introduces the Powered Dirichlet-Hawkes process (PDHP), a flexible clustering method that jointly models textual content and publication time, outperforming existing models especially when data is weakly informative.
Contribution
The paper presents PDHP, a novel model that generalizes previous approaches and effectively handles cases with weak or uncorrelated textual and temporal information.
Findings
PDHP outperforms state-of-the-art models in weakly informative scenarios.
PDHP generalizes previous models like DHP and UP.
Application to Reddit data demonstrates practical utility.
Abstract
The textual content of a document and its publication date are intertwined. For example, the publication of a news article on a topic is influenced by previous publications on similar issues, according to underlying temporal dynamics. However, it can be challenging to retrieve meaningful information when textual information conveys little. Furthermore, the textual content of a document is not always correlated to its temporal dynamics. We develop a method to create clusters of textual documents according to both their content and publication time, the Powered Dirichlet-Hawkes process (PDHP). PDHP yields significantly better results than state-of-the-art models when temporal information or textual content is weakly informative. PDHP also alleviates the hypothesis that textual content and temporal dynamics are perfectly correlated. We demonstrate that PDHP generalizes previous work --such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoint processes and geometric inequalities · Diffusion and Search Dynamics · Bayesian Methods and Mixture Models
