Policy Testing with MDPFuzz (Replicability Study)

Quentin Mazouni (1); Helge Spieker (1); Arnaud Gotlieb (1); Mathieu; Acher (2) ((1) Simula Research Laboratory; Oslo; Norway; (2) Univ Rennes,; CNRS; Inria; IRISA; Institut Universitaire de France (IUF); Rennes; France)

arXiv:2502.19116·cs.SE·February 27, 2025

Policy Testing with MDPFuzz (Replicability Study)

Quentin Mazouni (1), Helge Spieker (1), Arnaud Gotlieb (1), Mathieu, Acher (2) ((1) Simula Research Laboratory, Oslo, Norway, (2) Univ Rennes,, CNRS, Inria, IRISA, Institut Universitaire de France (IUF), Rennes, France)

PDF

1 Repo

TL;DR

This study replicates and extends the evaluation of MDPFuzz, a black-box fuzz testing framework for Markov decision processes, revealing that its coverage guidance may not outperform simpler methods in fault detection.

Contribution

It reproduces and extends the original MDPFuzz evaluation, including new use cases and parameter analysis, challenging previous conclusions about its effectiveness.

Findings

01

Ablated Fuzzer often outperforms MDPFuzz in fault detection

02

Coverage guidance does not significantly improve fault finding

03

Replication highlights limitations of the original GMM-based approach

Abstract

In recent years, following tremendous achievements in Reinforcement Learning, a great deal of interest has been devoted to ML models for sequential decision-making. Together with these scientific breakthroughs/advances, research has been conducted to develop automated functional testing methods for finding faults in black-box Markov decision processes. Pang et al. (ISSTA 2022) presented a black-box fuzz testing framework called MDPFuzz. The method consists of a fuzzer whose main feature is to use Gaussian Mixture Models (GMMs) to compute coverage of the test inputs as the likelihood to have already observed their results. This guidance through coverage evaluation aims at favoring novelty during testing and fault discovery in the decision model. Pang et al. evaluated their work with four use cases, by comparing the number of failures found after twelve-hour testing campaigns with or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

quentinmaz/mdpfuzz_replicability_study_artifact
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.