Randomness control and reproducibility study of random forest algorithm   in R and Python

Louisa Camadini; Yanis Bouzid; Maeva Merlet; L\'eopold Carron

arXiv:2408.12184·cs.AI·August 23, 2024

Randomness control and reproducibility study of random forest algorithm in R and Python

Louisa Camadini, Yanis Bouzid, Maeva Merlet, L\'eopold Carron

PDF

Open Access

TL;DR

This study examines the reproducibility of random forest algorithms across R and Python implementations, focusing on controlling randomness to ensure consistent results in toxicological assessments.

Contribution

It compares four random forest packages and identifies key parameters and randomness sources affecting reproducibility across implementations.

Findings

01

Reproducibility depends on consistent PRNG and parameter settings.

02

Differences in implementations can lead to variability in results.

03

Guidelines for ensuring reproducibility across R and Python are proposed.

Abstract

When it comes to the safety of cosmetic products, compliance with regulatory standards is crucialto guarantee consumer protection against the risks of skin irritation. Toxicologists must thereforebe fully conversant with all risks. This applies not only to their day-to-day work, but also to allthe algorithms they integrate into their routines. Recognizing this, ensuring the reproducibility ofalgorithms becomes one of the most crucial aspects to address.However, how can we prove the robustness of an algorithm such as the random forest, that reliesheavily on randomness? In this report, we will discuss the strategy of integrating random forest intoocular tolerance assessment for toxicologists.We will compare four packages: randomForest and Ranger (R packages), adapted in Python via theSKRanger package, and the widely used Scikit-Learn with the RandomForestClassifier() function.Our goal is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote Sensing and LiDAR Applications