On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization
Kaaustaaub Shankar, Wilhelm Louw, Kelly Cohen

TL;DR
This paper introduces a new reinforcement learning approach using Proximal Policy Optimization to train neuro-fuzzy controllers, achieving stable and fast convergence in the CartPole environment, outperforming previous DQN-based methods.
Contribution
It presents the first PPO-based framework for training ANFIS policies, demonstrating improved stability and convergence over prior DQN-based neuro-fuzzy methods.
Findings
PPO-trained fuzzy agents consistently reached maximum return of 500.
Achieved zero variance in performance after 20,000 updates.
Outperformed ANFIS-DQN baselines in stability and convergence speed.
Abstract
We present a reinforcement learning method for training neuro-fuzzy controllers using Proximal Policy Optimization (PPO). Unlike prior approaches that used Deep Q-Networks (DQN) with Adaptive Neuro-Fuzzy Inference Systems (ANFIS), our PPO-based framework leverages a stable on-policy actor-critic setup. Evaluated on the CartPole-v1 environment across multiple seeds, PPO-trained fuzzy agents consistently achieved the maximum return of 500 with zero variance after 20000 updates, outperforming ANFIS-DQN baselines in both stability and convergence speed. This highlights PPO's potential for training explainable neuro-fuzzy agents in reinforcement learning tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Auction Theory and Applications
