DiviK: Divisive intelligent K-Means for hands-free unsupervised clustering in big biological data
Grzegorz Mrukwa (1, 2), Joanna Polanska (1) ((1) Silesian, University of Technology, (2) Netguru)

TL;DR
DiviK is a scalable, local feature space adaptive clustering algorithm designed for high-dimensional biological data, eliminating the need for predefining the number of clusters and effectively handling heterogeneity in mass spectrometry imaging datasets.
Contribution
It introduces a novel stepwise clustering method with local data-driven feature adaptation, improving robustness and usability in high-dimensional biological data analysis.
Findings
Validated on high-throughput mass spectrometry imaging datasets
Provides a balance between heterogeneity detection and biological plausibility
Does not require prior knowledge of the number of structures
Abstract
Investigating molecular heterogeneity provides insights about tumor origin and metabolomics. The increasing amount of data gathered makes manual analyses infeasible - therefore, automated unsupervised learning approaches are utilized for discovering heterogeneity. However, automated unsupervised analyses require a lot of experience with setting their hyperparameters and usually an upfront knowledge about the number of expected substructures. Moreover, numerous measured molecules require an additional step of feature engineering to provide valuable results. In this work, we propose DiviK: a scalable stepwise algorithm with local data-driven feature space adaptation for the segmentation of high-dimensional datasets. The combination of three quality indices: Dice Index, Rand Index and EXIMS score are used to assess the quality of unsupervised analyses in 3D space. DiviK was validated on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Advanced Proteomics Techniques and Applications · Gene expression and cancer classification
