Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative   Framework for Imitation Learning from Observation

Anish Abhijit Diwan; Julen Urain; Jens Kober; Jan Peters

arXiv:2501.14856·cs.RO·February 13, 2025

Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation

Anish Abhijit Diwan, Julen Urain, Jens Kober, Jan Peters

PDF

Open Access 1 Video

TL;DR

NEAR is a novel imitation learning framework that uses energy-based models and denoising score matching to learn complex robot motions from state-only data, avoiding adversarial training challenges.

Contribution

The paper introduces NEAR, a new energy-based generative framework for imitation learning from observation that leverages denoising score matching and a gradual reward switching strategy.

Findings

01

NEAR achieves comparable performance to adversarial methods like AMP.

02

It effectively learns complex humanoid motions such as locomotion and martial arts.

03

The framework sidesteps adversarial training issues and provides stable reward representations.

Abstract

This paper introduces a new imitation learning framework based on energy-based generative models capable of learning complex, physics-dependent, robot motion policies through state-only expert motion trajectories. Our algorithm, called Noise-conditioned Energy-based Annealed Rewards (NEAR), constructs several perturbed versions of the expert's motion data distribution and learns smooth, and well-defined representations of the data distribution's energy function using denoising score matching. We propose to use these learnt energy functions as reward functions to learn imitation policies via reinforcement learning. We also present a strategy to gradually switch between the learnt energy functions, ensuring that the learnt rewards are always well-defined in the manifold of policy-generated samples. We evaluate our algorithm on complex humanoid tasks such as locomotion and martial arts and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Music and Audio Processing · Neural Networks and Applications

MethodsAdversarial Model Perturbation