Policy Testing with MDPFuzz (Replicability Study)
Quentin Mazouni (1), Helge Spieker (1), Arnaud Gotlieb (1), Mathieu, Acher (2) ((1) Simula Research Laboratory, Oslo, Norway, (2) Univ Rennes,, CNRS, Inria, IRISA, Institut Universitaire de France (IUF), Rennes, France)

TL;DR
This study replicates and extends the evaluation of MDPFuzz, a black-box fuzz testing framework for Markov decision processes, revealing that its coverage guidance may not outperform simpler methods in fault detection.
Contribution
It reproduces and extends the original MDPFuzz evaluation, including new use cases and parameter analysis, challenging previous conclusions about its effectiveness.
Findings
Ablated Fuzzer often outperforms MDPFuzz in fault detection
Coverage guidance does not significantly improve fault finding
Replication highlights limitations of the original GMM-based approach
Abstract
In recent years, following tremendous achievements in Reinforcement Learning, a great deal of interest has been devoted to ML models for sequential decision-making. Together with these scientific breakthroughs/advances, research has been conducted to develop automated functional testing methods for finding faults in black-box Markov decision processes. Pang et al. (ISSTA 2022) presented a black-box fuzz testing framework called MDPFuzz. The method consists of a fuzzer whose main feature is to use Gaussian Mixture Models (GMMs) to compute coverage of the test inputs as the likelihood to have already observed their results. This guidance through coverage evaluation aims at favoring novelty during testing and fault discovery in the decision model. Pang et al. evaluated their work with four use cases, by comparing the number of failures found after twelve-hour testing campaigns with or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
