Random Similarity Forests

Maciej Piernik; Dariusz Brzezinski; Pawel Zawadzki

arXiv:2204.05389·cs.LG·April 13, 2022

Random Similarity Forests

Maciej Piernik, Dariusz Brzezinski, Pawel Zawadzki

PDF

Open Access

TL;DR

The paper introduces Random Similarity Forests, a novel classification method that effectively handles complex, multi-type data by integrating multiple domain-specific similarity measures, outperforming traditional methods on diverse datasets.

Contribution

It presents a new algorithm combining Random Forests with similarity measures to handle arbitrary data types without data simplification or omission.

Findings

01

Performs on par with Random Forests on numerical data.

02

Outperforms traditional methods on complex and mixed data domains.

03

Effective on noisy, multi-source datasets in life sciences.

Abstract

The wealth of data being gathered about humans and their surroundings drives new machine learning applications in various fields. Consequently, more and more often, classifiers are trained using not only numerical data but also complex data objects. For example, multi-omics analyses attempt to combine numerical descriptions with distributions, time series data, discrete sequences, and graphs. Such integration of data from different domains requires either omitting some of the data, creating separate models for different formats, or simplifying some of the data to adhere to a shared scale and format, all of which can hinder predictive performance. In this paper, we propose a classification method capable of handling datasets with features of arbitrary data types while retaining each feature's characteristic. The proposed algorithm, called Random Similarity Forest, uses multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMetabolomics and Mass Spectrometry Studies · Gene expression and cancer classification · Bioinformatics and Genomic Networks