Learning Lifted Action Models from Unsupervised Visual Traces

Kai Xi; Stephen Gould; Sylvie Thi\'ebaux

arXiv:2604.19043·cs.AI·May 8, 2026

Learning Lifted Action Models from Unsupervised Visual Traces

Kai Xi, Stephen Gould, Sylvie Thi\'ebaux

PDF

TL;DR

This paper presents a deep learning framework that learns lifted action models from sequences of state images without action labels, using MILP-based correction to improve consistency and accuracy.

Contribution

It introduces a novel combination of deep learning and MILP optimization to learn action models from visual data without supervision.

Findings

01

MILP correction improves model convergence to consistent solutions

02

The framework successfully learns action models from image sequences in multiple domains

03

Integrating MILP helps prevent prediction collapse and self-reinforcing errors

Abstract

Efficient construction of models capturing the preconditions and effects of actions is essential for applying AI planning in real-world domains. Extensive prior work has explored learning such models from high-level descriptions of state and/or action sequences. In this paper, we tackle a more challenging setting: learning lifted action models from sequences of state images, without action observation. We propose a deep learning framework that jointly learns state prediction, action prediction, and a lifted action model. We also introduce a mixed-integer linear program (MILP) to prevent prediction collapse and self-reinforcing errors among predictions. The MILP takes the predicted states, actions, and action model over a subset of traces and solves for logically consistent states, actions, and action model that are as close as possible to the original predictions. Pseudo-labels…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.