Addressing Action Oscillations through Learning Policy Inertia
Chen Chen, Hongyao Tang, Jianye Hao, Wulong Liu, Zhaopeng Meng

TL;DR
This paper introduces Policy Inertia Controller (PIC) to reduce action oscillations in deep reinforcement learning, improving safety and user experience without sacrificing performance across various tasks.
Contribution
The paper proposes a novel plug-in framework, PIC, and a training algorithm, Nested Policy Iteration, to balance policy optimality and smoothness in DRL.
Findings
Significant reduction in action oscillations in autonomous driving and Atari games.
Almost no performance loss compared to baseline algorithms.
Demonstrates effectiveness across multiple challenging tasks.
Abstract
Deep reinforcement learning (DRL) algorithms have been demonstrated to be effective in a wide range of challenging decision making and control tasks. However, these methods typically suffer from severe action oscillations in particular in discrete action setting, which means that agents select different actions within consecutive steps even though states only slightly differ. This issue is often neglected since the policy is usually evaluated by its cumulative rewards only. Action oscillation strongly affects the user experience and can even cause serious potential security menace especially in real-world domains with the main concern of safety, such as autonomous driving. To this end, we introduce Policy Inertia Controller (PIC) which serves as a generic plug-in framework to off-the-shelf DRL algorithms, to enables adaptive trade-off between the optimality and smoothness of the learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
