VJEPA: Variational Joint Embedding Predictive Architectures as Probabilistic World Models
Yongchao Huang

TL;DR
VJEPA introduces a probabilistic, variational approach to joint embedding predictive architectures, enabling uncertainty estimation, robust representation learning, and effective control in noisy, high-dimensional environments.
Contribution
It generalizes JEPA to a probabilistic framework, unifies it with PSRs and Bayesian filtering, and introduces BJEPA for modular, transferable predictive models.
Findings
VJEPA effectively filters out high-variance distractors.
It provides formal guarantees for avoiding representation collapse.
Demonstrates robust uncertainty estimation in noisy environments.
Abstract
Joint Embedding Predictive Architectures (JEPA) offer a scalable paradigm for self-supervised learning by predicting latent representations rather than reconstructing high-entropy observations. However, existing formulations rely on \textit{deterministic} regression objectives, which mask probabilistic semantics and limit its applicability in stochastic control. In this work, we introduce \emph{Variational JEPA (VJEPA)}, a \textit{probabilistic} generalization that learns a predictive distribution over future latent states via a variational objective. We show that VJEPA unifies representation learning with Predictive State Representations (PSRs) and Bayesian filtering, establishing that sequential modeling does not require autoregressive observation likelihoods. Theoretically, we prove that VJEPA representations can serve as sufficient information states for optimal control without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis
