Deep Actor-Critics with Tight Risk Certificates

Bahareh Tasdighi; Manuel Haussmann; Yi-Shan Wu; Andres R. Masegosa; Melih Kandemir

arXiv:2505.19682·cs.LG·November 27, 2025

Deep Actor-Critics with Tight Risk Certificates

Bahareh Tasdighi, Manuel Haussmann, Yi-Shan Wu, Andres R. Masegosa, Melih Kandemir

PDF

Open Access

TL;DR

This paper develops tight risk certificates for deep actor-critic algorithms using minimal validation data and recursive PAC-Bayes bounds, enabling better generalization guarantees for real-world deployment.

Contribution

It introduces a novel method combining minimal evaluation data with recursive PAC-Bayes bounds to produce practical risk certificates for deep actor-critic algorithms.

Findings

01

Risk certificates are tight enough for practical use.

02

Minimal evaluation data suffices for accurate risk estimation.

03

Method applies across various tasks and policy levels.

Abstract

Deep actor-critic algorithms have reached a level where they influence everyday life. They are a driving force behind continual improvement of large language models through user feedback. However, their deployment in physical systems is not yet widely adopted, mainly because no validation scheme fully quantifies their risk of malfunction. We demonstrate that it is possible to develop tight risk certificates for deep actor-critic algorithms that predict generalization performance from validation-time observations. Our key insight centers on the effectiveness of minimal evaluation data. A small feasible set of evaluation roll-outs collected from a pretrained policy suffices to produce accurate risk certificates when combined with a simple adaptation of PAC-Bayes theory. Specifically, we adopt a recently introduced recursive PAC-Bayes approach, which splits validation data into portions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhilosophy and History of Science · Computability, Logic, AI Algorithms · Cybernetics and Technology in Society

MethodsADaptive gradient method with the OPTimal convergence rate