seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models

Hafez Ghaemi; Eilif Muller; Shahab Bakhtiari

arXiv:2505.03176·cs.CV·January 12, 2026

seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models

Hafez Ghaemi, Eilif Muller, Shahab Bakhtiari

PDF

Open Access

TL;DR

seq-JEPA introduces a novel world modeling framework that learns separate invariant and equivariant representations from sequences, improving flexibility and performance on diverse downstream tasks without trade-offs.

Contribution

It proposes architectural biases in joint-embedding predictive models to simultaneously learn invariant and equivariant representations from sequences, avoiding dual predictors or loss terms.

Findings

01

Strong performance on both invariance- and equivariance-demanding tasks

02

Effective in sequence-based tasks like path integration and eye movement prediction

03

Outperforms existing SSL methods in flexibility and downstream adaptation

Abstract

Joint-embedding self-supervised learning (SSL) commonly relies on transformations such as data augmentation and masking to learn visual representations, a task achieved by enforcing invariance or equivariance with respect to these transformations applied to two views of an image. This dominant two-view paradigm in SSL often limits the flexibility of learned representations for downstream adaptation by creating performance trade-offs between high-level invariance-demanding tasks such as image classification and more fine-grained equivariance-related tasks. In this work, we propose \emph{seq-JEPA}, a world modeling framework that introduces architectural inductive biases into joint-embedding predictive architectures to resolve this trade-off. Without relying on dual equivariance predictors or loss terms, seq-JEPA simultaneously learns two architecturally separate representations for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling