Random Similarity Forests
Maciej Piernik, Dariusz Brzezinski, Pawel Zawadzki

TL;DR
The paper introduces Random Similarity Forests, a novel classification method that effectively handles complex, multi-type data by integrating multiple domain-specific similarity measures, outperforming traditional methods on diverse datasets.
Contribution
It presents a new algorithm combining Random Forests with similarity measures to handle arbitrary data types without data simplification or omission.
Findings
Performs on par with Random Forests on numerical data.
Outperforms traditional methods on complex and mixed data domains.
Effective on noisy, multi-source datasets in life sciences.
Abstract
The wealth of data being gathered about humans and their surroundings drives new machine learning applications in various fields. Consequently, more and more often, classifiers are trained using not only numerical data but also complex data objects. For example, multi-omics analyses attempt to combine numerical descriptions with distributions, time series data, discrete sequences, and graphs. Such integration of data from different domains requires either omitting some of the data, creating separate models for different formats, or simplifying some of the data to adhere to a shared scale and format, all of which can hinder predictive performance. In this paper, we propose a classification method capable of handling datasets with features of arbitrary data types while retaining each feature's characteristic. The proposed algorithm, called Random Similarity Forest, uses multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Gene expression and cancer classification · Bioinformatics and Genomic Networks
