Who Started It? Identifying Root Sources in Textual Conversation Threads
Wei Zhang, Fan Bu, Derek Owens-Oas, Katherine Heller, and Xiaojin Zhu

TL;DR
This paper introduces a generative model using marked multivariate Hawkes processes to identify root sources in textual conversation threads, especially when reply structures are missing, with proven effectiveness on synthetic and real data.
Contribution
It proposes a novel probabilistic framework and an efficient algorithm for root source identification in social media conversations without reply structure.
Findings
Accurately identifies root sources matching ground truth.
Effectively handles missing reply structure data.
Demonstrates strong performance on real-world datasets.
Abstract
In textual conversation threads, as found on many popular social media platforms, each particular user text comment either originates a new thread of discussion, or replies to a previous comment. An individual who makes an original comment ---termed as the "root source''---is a topic initiator or even an information source, and identifying such individuals is of particular interest. The reply structure of comments is not always available (e.g. in the proliferation of a news event), and thus identifying root sources is a nontrivial task. In this paper, we develop a generative model based on marked multivariate Hawkes processes, and introduce a novel concept, "root source probability", to quantify the uncertainty in attributing possible root sources to each comment. A dynamic-programming-based algorithm is then derived to efficiently compute root source probabilities. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Science and Education Research · Advanced Text Analysis Techniques · Topic Modeling
