Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty
Zhaoming Xie, Kevin Karol, Jessica Hodgins

TL;DR
This paper introduces a new architecture called Linear Policy Net combined with an action Jacobian penalty to learn smooth, realistic control policies in reinforcement learning, reducing high-frequency signals and computational costs.
Contribution
The paper proposes the Linear Policy Net architecture and the use of an action Jacobian penalty to improve policy smoothness and training efficiency without extensive tuning.
Findings
LPN reduces computational overhead for Jacobian penalty.
Policies learned are smooth and realistic in simulation.
Effective on dynamic and complex motion tasks.
Abstract
Reinforcement learning provides a framework for learning control policies that can reproduce diverse motions for simulated characters. However, such policies often exploit unnatural high-frequency signals that are unachievable by humans or physical robots, making them poor representations of real-world behaviors. Existing work addresses this issue by adding a reward term that penalizes a large change in actions over time. This term often requires substantial tuning efforts. We propose to use the action Jacobian penalty, which penalizes changes in action with respect to the changes in simulated state directly through auto differentiation. This effectively eliminates unrealistic high-frequency control signals without task specific tuning. While effective, the action Jacobian penalty introduces significant computational overhead when used with traditional fully connected neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Robot Manipulation and Learning · Reinforcement Learning in Robotics
