Relative Entropy Regularized Reinforcement Learning for Efficient Encrypted Policy Synthesis

Jihoon Suh; Yeongjun Jang; Kaoru Teranishi; Takashi Tanaka

arXiv:2506.12358·cs.LG·June 17, 2025

Relative Entropy Regularized Reinforcement Learning for Efficient Encrypted Policy Synthesis

Jihoon Suh, Yeongjun Jang, Kaoru Teranishi, Takashi Tanaka

PDF

TL;DR

This paper introduces a novel method for privacy-preserving reinforcement learning that integrates fully homomorphic encryption with a relative-entropy-regularized framework, enabling efficient encrypted policy synthesis with theoretical guarantees.

Contribution

It presents a new encrypted reinforcement learning framework that simplifies value iteration and provides convergence analysis for privacy-preserving policy development.

Findings

01

Effective integration of FHE with RL for encrypted policies

02

Theoretical convergence and error bounds established

03

Numerical simulations validate the approach

Abstract

We propose an efficient encrypted policy synthesis to develop privacy-preserving model-based reinforcement learning. We first demonstrate that the relative-entropy-regularized reinforcement learning framework offers a computationally convenient linear and ``min-free'' structure for value iteration, enabling a direct and efficient integration of fully homomorphic encryption with bootstrapping into policy synthesis. Convergence and error bounds are analyzed as encrypted policy synthesis propagates errors under the presence of encryption-induced errors including quantization and bootstrapping. Theoretical analysis is validated by numerical simulations. Results demonstrate the effectiveness of the RERL framework in integrating FHE for encrypted policy synthesis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.