seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Hafez Ghaemi, Eilif Muller, Shahab Bakhtiari

TL;DR
seq-JEPA introduces a novel world modeling framework that learns separate invariant and equivariant representations from sequences, improving flexibility and performance on diverse downstream tasks without trade-offs.
Contribution
It proposes architectural biases in joint-embedding predictive models to simultaneously learn invariant and equivariant representations from sequences, avoiding dual predictors or loss terms.
Findings
Strong performance on both invariance- and equivariance-demanding tasks
Effective in sequence-based tasks like path integration and eye movement prediction
Outperforms existing SSL methods in flexibility and downstream adaptation
Abstract
Joint-embedding self-supervised learning (SSL) commonly relies on transformations such as data augmentation and masking to learn visual representations, a task achieved by enforcing invariance or equivariance with respect to these transformations applied to two views of an image. This dominant two-view paradigm in SSL often limits the flexibility of learned representations for downstream adaptation by creating performance trade-offs between high-level invariance-demanding tasks such as image classification and more fine-grained equivariance-related tasks. In this work, we propose \emph{seq-JEPA}, a world modeling framework that introduces architectural inductive biases into joint-embedding predictive architectures to resolve this trade-off. Without relying on dual equivariance predictors or loss terms, seq-JEPA simultaneously learns two architecturally separate representations for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
