Biomedical Document Clustering and Visualization based on the Concepts of Diseases
Setu Shah, Xiao Luo

TL;DR
This paper presents a novel biomedical document clustering method that uses disease concepts and their associations to improve clustering quality and visualization, aiding better search and analysis in biomedical corpora.
Contribution
It introduces a vector representation of disease concepts with a new weighting scheme and employs Self-Organizing Map for clustering and visualization, addressing limitations of ontology-based approaches.
Findings
Generated meaningful disease-based clusters
Enhanced visualization of cluster relationships
Improved clustering accuracy over existing methods
Abstract
Document clustering is a text mining technique used to provide better document search and browsing in digital libraries or online corpora. A lot of research has been done on biomedical document clustering that is based on using existing ontology. But, associations and co-occurrences of the medical concepts are not well represented by using ontology. In this research, a vector representation of concepts of diseases and similarity measurement between concepts are proposed. They identify the closest concepts of diseases in the context of a corpus. Each document is represented by using the vector space model. A weight scheme is proposed to consider both local content and associations between concepts. A Self-Organizing Map is used as document clustering algorithm. The vector projection and visualization features of SOM enable visualization and analysis of the clusters distributions and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Advanced Text Analysis Techniques
