Value Summation: A Novel Scoring Function for MPC-based Model-based Reinforcement Learning
Mehran Raisi, Amirhossein Noohian, Luc Mccutcheon, Saber Fallah

TL;DR
This paper introduces a new scoring function for MPC-based reinforcement learning that improves learning efficiency by using a value-based approach, guiding policy updates with optimal trajectories and outperforming current methods.
Contribution
The paper presents a novel value summation scoring function that addresses reward bias in MPC-based MBRL, enhancing learning efficiency and policy performance.
Findings
Outperforms state-of-the-art algorithms in MuJoCo environments
Achieves higher average reward returns
Demonstrates improved learning efficiency in robot locomotion tasks
Abstract
This paper proposes a novel scoring function for the planning module of MPC-based reinforcement learning methods to address the inherent bias of using the reward function to score trajectories. The proposed method enhances the learning efficiency of existing MPC-based MBRL methods using the discounted sum of values. The method utilizes optimal trajectories to guide policy learning and updates its state-action value function based on real-world and augmented onboard data. The learning efficiency of the proposed method is evaluated in selected MuJoCo Gym environments as well as in learning locomotion skills for a simulated model of the Cassie robot. The results demonstrate that the proposed method outperforms the current state-of-the-art algorithms in terms of learning efficiency and average reward return.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Viral Infectious Diseases and Gene Expression in Insects · Evolutionary Algorithms and Applications
