Towards Using Multiple Iterated, Reproduced, and Replicated Experiments with Robots (MIRRER) for Evaluation and Benchmarking
Adam Norton, Brian Flynn (New England Robotics Validation and, Experimentation (NERVE) Center, University of Massachusetts Lowell)

TL;DR
This paper introduces MIRRER, a framework for evaluating and benchmarking robotics experiments through multiple, reproduced, and replicated tests to improve reproducibility and generalizability in robotics research.
Contribution
It proposes a novel conceptual framework that unites evaluation, benchmarking, and replication for robotics experiments, addressing current gaps in reproducibility and comparability.
Findings
Initial framework MIRRER outlined for robotics evaluation.
Highlights open issues in applying the framework.
Aims to standardize reproducibility and benchmarking in robotics.
Abstract
The robotics research field lacks formalized definitions and frameworks for evaluating advanced capabilities including generalizability (the ability for robots to perform tasks under varied contexts) and reproducibility (the performance of a reproduced robot capability in different labs under the same experimental conditions). This paper presents an initial conceptual framework, MIRRER, that unites the concepts of performance evaluation, benchmarking, and reproduced/replicated experimentation in order to facilitate comparable robotics research. Several open issues with the application of the framework are also presented.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Modular Robots and Swarm Intelligence
