A statistical machine learning approach for benchmarking in the presence of complex contextual factors and peer groups
Daniel W. Kennedy, Jessica Cameron, Paul P.-Y. Wu, Kerrie Mengersen

TL;DR
This paper introduces a random forest-based benchmarking method that effectively accounts for complex nonlinear relationships and contextual factors, enabling fairer comparisons among organizations, validated through a high-noise case study.
Contribution
It proposes a novel application of random forests for benchmarking, improving upon linear regression methods by handling nonlinear relationships and providing interpretable visualizations.
Findings
Random forest models improve fairness in benchmarking with complex data.
The approach effectively adjusts for nonlinear relationships between measures and covariates.
Bootstrapping estimates uncertainty in rankings and measures.
Abstract
The ability to compare between individuals or organisations fairly is important for the development of robust and meaningful quantitative benchmarks. To make fair comparisons, contextual factors must be taken into account, and comparisons should only be made between similar organisations such as peer groups. Previous benchmarking methods have used linear regression to adjust for contextual factors, however linear regression is known to be sub-optimal when nonlinear relationships exist between the comparative measure and covariates. In this paper we propose a random forest model for benchmarking that can adjust for these potential nonlinear relationships, and validate the approach in a case-study of high noise data. We provide new visualisations and numerical summaries of the fitted models and comparative measures to facilitate interpretation by both analysts and non-technical audiences.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Forecasting Techniques and Applications · Data Analysis with R
