On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization

Kaaustaaub Shankar; Wilhelm Louw; Kelly Cohen

arXiv:2507.01039·cs.LG·July 8, 2025

On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization

Kaaustaaub Shankar, Wilhelm Louw, Kelly Cohen

PDF

Open Access

TL;DR

This paper introduces a new reinforcement learning approach using Proximal Policy Optimization to train neuro-fuzzy controllers, achieving stable and fast convergence in the CartPole environment, outperforming previous DQN-based methods.

Contribution

It presents the first PPO-based framework for training ANFIS policies, demonstrating improved stability and convergence over prior DQN-based neuro-fuzzy methods.

Findings

01

PPO-trained fuzzy agents consistently reached maximum return of 500.

02

Achieved zero variance in performance after 20,000 updates.

03

Outperformed ANFIS-DQN baselines in stability and convergence speed.

Abstract

We present a reinforcement learning method for training neuro-fuzzy controllers using Proximal Policy Optimization (PPO). Unlike prior approaches that used Deep Q-Networks (DQN) with Adaptive Neuro-Fuzzy Inference Systems (ANFIS), our PPO-based framework leverages a stable on-policy actor-critic setup. Evaluated on the CartPole-v1 environment across multiple seeds, PPO-trained fuzzy agents consistently achieved the maximum return of 500 with zero variance after 20000 updates, outperforming ANFIS-DQN baselines in both stability and convergence speed. This highlights PPO's potential for training explainable neuro-fuzzy agents in reinforcement learning tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Auction Theory and Applications