On the Assessment of Benchmark Suites for Algorithm Comparison

David Issa Mattos; Lucas Ruud; Jan Bosch; Helena Holmstr\"om Olsson

arXiv:2104.07381·cs.NE·April 16, 2021

On the Assessment of Benchmark Suites for Algorithm Comparison

David Issa Mattos, Lucas Ruud, Jan Bosch, Helena Holmstr\"om Olsson

PDF

Open Access

TL;DR

This paper introduces a statistical method based on item response theory to evaluate the effectiveness of benchmark suites in algorithm comparison, focusing on difficulty, discrimination, and informativeness.

Contribution

It applies a Bayesian IRT model to assess benchmark functions, providing a new way to evaluate and improve benchmark suite design for algorithm testing.

Findings

01

BBOB functions are generally difficult and poorly discriminate algorithms.

02

PBO functions are easier and have better discrimination but are less challenging.

03

IRT can guide the development of more effective benchmark suites.

Abstract

Benchmark suites, i.e. a collection of benchmark functions, are widely used in the comparison of black-box optimization algorithms. Over the years, research has identified many desired qualities for benchmark suites, such as diverse topology, different difficulties, scalability, representativeness of real-world problems among others. However, while the topology characteristics have been subjected to previous studies, there is no study that has statistically evaluated the difficulty level of benchmark functions, how well they discriminate optimization algorithms and how suitable is a benchmark suite for algorithm comparison. In this paper, we propose the use of an item response theory (IRT) model, the Bayesian two-parameter logistic model for multiple attempts, to statistically evaluate these aspects with respect to the empirical success rate of algorithms. With this model, we can assess…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Machine Learning and Data Classification · Sports Analytics and Performance