Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform
Zhen Xu, Sergio Escalera, Isabelle Guyon, Adrien Pav\~ao, Magali, Richard, Wei-Wei Tu, Quanming Yao, Huan Zhao

TL;DR
Codabench is an open-source platform that simplifies and standardizes benchmarking in data science, enabling fair, flexible, and reproducible comparisons of algorithms across diverse applications.
Contribution
It introduces a versatile, community-driven benchmarking platform with unique features for flexible, reproducible, and fair evaluation of algorithms in data science.
Findings
Over 130 users and 2500 submissions on the platform
Used in diverse applications like Graph ML, Cancer Heterogeneity, Clinical Diagnosis, Reinforcement Learning
Facilitates fair comparison with customizable protocols
Abstract
Obtaining standardized crowdsourced benchmark of computational methods is a major issue in data science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here we introduce Codabench, an open-source, community-driven platform for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench (https://www.codabench.org/) is open to everyone, free of charge, and allows benchmark organizers to compare fairly submissions, under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating the organization of benchmarks flexibly, easily and reproducibly, such as the possibility of re-using templates of benchmarks, and supplying compute resources on-demand. Codabench has been used internally and externally on various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization
