Biomedical Document Clustering and Visualization based on the Concepts   of Diseases

Setu Shah; Xiao Luo

arXiv:1810.09597·cs.CL·October 24, 2018·1 cites

Biomedical Document Clustering and Visualization based on the Concepts of Diseases

Setu Shah, Xiao Luo

PDF

Open Access

TL;DR

This paper presents a novel biomedical document clustering method that uses disease concepts and their associations to improve clustering quality and visualization, aiding better search and analysis in biomedical corpora.

Contribution

It introduces a vector representation of disease concepts with a new weighting scheme and employs Self-Organizing Map for clustering and visualization, addressing limitations of ontology-based approaches.

Findings

01

Generated meaningful disease-based clusters

02

Enhanced visualization of cluster relationships

03

Improved clustering accuracy over existing methods

Abstract

Document clustering is a text mining technique used to provide better document search and browsing in digital libraries or online corpora. A lot of research has been done on biomedical document clustering that is based on using existing ontology. But, associations and co-occurrences of the medical concepts are not well represented by using ontology. In this research, a vector representation of concepts of diseases and similarity measurement between concepts are proposed. They identify the closest concepts of diseases in the context of a corpus. Each document is represented by using the vector space model. A weight scheme is proposed to consider both local content and associations between concepts. A Self-Organizing Map is used as document clustering algorithm. The vector projection and visualization features of SOM enable visualization and analysis of the clusters distributions and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Advanced Text Analysis Techniques