AExGym: Benchmarks and Environments for Adaptive Experimentation
Jimmy Wang, Ethan Che, Daniel R. Jiang, Hongseok Namkoong

TL;DR
This paper introduces AExGym, a benchmark and open-source library for adaptive experimentation that addresses real-world challenges, aiming to improve practical robustness and facilitate methodological advancements in adaptive trial designs.
Contribution
It provides a realistic benchmark with datasets and challenges for adaptive experimentation, along with a modular library to support development and testing of adaptive algorithms.
Findings
Highlights practical challenges like non-stationarity and delayed feedback
Provides a benchmark to evaluate adaptive methods in real-world scenarios
Offers an extensible library for developing custom adaptive experimentation environments
Abstract
Innovations across science and industry are evaluated using randomized trials (a.k.a. A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on real-world datasets, highlighting prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity. Our benchmark aims to spur methodological development that puts practical performance (e.g., robustness) as a central concern, rather than mathematical guarantees on contrived instances. We release an open source library, AExGym, which is designed with modularity and extensibility in mind to allow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Gaussian Processes and Bayesian Inference · Scientific Computing and Data Management
