When does predictive inverse dynamics outperform behavior cloning?

Lukas Sch\"afer; Pallavi Choudhury; Abdelhak Lemkhenter; Chris Lovett; Somjit Nath; Luis Fran\c{c}a; Matheus Ribeiro Furtado de Mendon\c{c}a; Alex Lamb; Riashat Islam; Siddhartha Sen; John Langford; Katja Hofmann; Sergio Valcarcel Macua

arXiv:2601.21718·cs.LG·January 30, 2026

When does predictive inverse dynamics outperform behavior cloning?

Lukas Sch\"afer, Pallavi Choudhury, Abdelhak Lemkhenter, Chris Lovett, Somjit Nath, Luis Fran\c{c}a, Matheus Ribeiro Furtado de Mendon\c{c}a, Alex Lamb, Riashat Islam, Siddhartha Sen, John Langford, Katja Hofmann, Sergio Valcarcel Macua

PDF

Open Access

TL;DR

This paper explains why predictive inverse dynamics models (PIDM) often outperform behavior cloning (BC) in imitation learning, especially with limited data, by analyzing their bias-variance tradeoff and validating results in navigation and video game environments.

Contribution

It provides a theoretical bias-variance analysis of PIDM versus BC and establishes conditions for when PIDM achieves better sample efficiency, supported by empirical validation.

Findings

01

PIDM requires fewer demonstrations than BC in navigation tasks.

02

In high-dimensional visual environments, PIDM outperforms BC with over 66% fewer samples.

03

Theoretical analysis links bias-variance tradeoff to sample efficiency improvements.

Abstract

Behavior cloning (BC) is a practical offline imitation learning method, but it often fails when expert demonstrations are limited. Recent works have introduced a class of architectures named predictive inverse dynamics models (PIDM) that combine a future state predictor with an inverse dynamics model (IDM). While PIDM often outperforms BC, the reasons behind its benefits remain unclear. In this paper, we provide a theoretical explanation: PIDM introduces a bias-variance tradeoff. While predicting the future state introduces bias, conditioning the IDM on the prediction can significantly reduce variance. We establish conditions on the state predictor bias for PIDM to achieve lower prediction error and higher sample efficiency than BC, with the gap widening when additional data sources are available. We validate the theoretical insights empirically in 2D navigation tasks, where BC requires…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Human Motion and Animation