TL;DR
Social-JEPA demonstrates that independent agents trained on the same environment develop aligned latent spaces, enabling direct translation and transfer of learned models without coordination.
Contribution
It introduces a method where separate agents learn emergent geometric isomorphism in their latent representations, facilitating interoperability without parameter sharing.
Findings
Latent spaces of different agents are related by an approximate linear isometry.
A classifier trained on one agent's latent space can be directly applied to another.
Alignment improves transfer learning and reduces computational costs.
Abstract
World models compress rich sensory streams into compact latent codes that anticipate future observations. We let separate agents acquire such models from distinct viewpoints of the same environment without any parameter sharing or coordination. After training, their internal representations exhibit a striking emergent property: the two latent spaces are related by an approximate linear isometry, enabling transparent translation between them. This geometric consensus survives large viewpoint shifts and scant overlap in raw pixels. Leveraging the learned alignment, a classifier trained on one agent can be ported to the other with no additional gradient steps, while distillation-like migration accelerates later learning and markedly reduces total compute. The findings reveal that predictive learning objectives impose strong regularities on representation geometry, suggesting a lightweight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
