ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning
Aleksandar Vujinovic, Aleksandar Kovacevic

TL;DR
ACT-JEPA introduces a unified architecture combining imitation learning and self-supervised learning to improve policy representations and world models, leading to significant performance gains across various environments.
Contribution
It presents ACT-JEPA, a novel joint-embedding predictive architecture that trains end-to-end to predict action and latent observation sequences, enhancing policy learning efficiency.
Findings
Outperforms baseline in all tested environments.
Achieves up to 40% better world model understanding.
Up to 10% higher task success rate.
Abstract
Learning efficient representations for decision-making policies is a challenge in imitation learning (IL). Current IL methods require expert demonstrations, which are expensive to collect. Additionally, they are not explicitly trained to understand the environment. Consequently, they have underdeveloped world models. Self-supervised learning (SSL) offers an alternative, as it can learn a world model from diverse, unlabeled data. However, most SSL methods are inefficient because they operate in raw input space. In this work, we propose ACT-JEPA, a novel architecture that unifies IL and SSL to enhance policy representations. It is trained end-to-end to jointly predict 1) action sequences and 2) latent observation sequences. To learn in latent space, we utilize Joint-Embedding Predictive Architecture, which allows the model to filter out irrelevant details and learn a robust world model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Software System Performance and Reliability
