PDC -- a probabilistic distributional clustering algorithm: a case study on suicide articles in PubMed
Rezarta Islamaj, Lana Yeganova, Won Kim, Natalie Xie, W. John Wilbur,, Zhiyong Lu

TL;DR
This paper introduces PDC, a probabilistic clustering algorithm that organizes large document collections into disjoint topic groups, demonstrated on PubMed articles about suicide to aid mental health research.
Contribution
The paper presents a novel probabilistic distributional clustering algorithm and an environment for visualization and retrieval of related articles, applied specifically to suicide literature in PubMed.
Findings
Effective topic grouping of PubMed articles on suicide.
Visualization environment facilitates exploration of mental health literature.
Web portal supports researchers in understanding suicide-related research scope.
Abstract
The need to organize a large collection in a manner that facilitates human comprehension is crucial given the ever-increasing volumes of information. In this work, we present PDC (probabilistic distributional clustering), a novel algorithm that, given a document collection, computes disjoint term sets representing topics in the collection. The algorithm relies on probabilities of word co-occurrences to partition the set of terms appearing in the collection of documents into disjoint groups of related terms. In this work, we also present an environment to visualize the computed topics in the term space and retrieve the most related PubMed articles for each group of terms. We illustrate the algorithm by applying it to PubMed documents on the topic of suicide. Suicide is a major public health problem identified as the tenth leading cause of death in the US. In this application, our goal is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Advanced Text Analysis Techniques · Data-Driven Disease Surveillance
