SCAR: Self-Supervised Continuous Action Representation Learning
Hongjia Liu, Fan Feng, Minghao Fu, Xinyue Wang, Haofei Lu, Biwei Huang

TL;DR
This paper introduces SCAR, a self-supervised framework for learning transferable, unified action representations from visual data, improving cross-embodiment generalization and low-data adaptation in embodied intelligence tasks.
Contribution
SCAR proposes a joint inverse-forward dynamics model with regularization and invariance techniques to learn shared action representations across different embodiments from visual transitions.
Findings
Learned action representations outperform raw actions for world modeling.
Enhanced cross-embodiment low-data adaptation.
Improved cross-task transfer performance.
Abstract
Despite the central role of action in embodied intelligence, learning transferable action representations from visual transitions remains a fundamental challenge, particularly when world models must generalize across embodiments under limited data. We argue that action is not merely an auxiliary conditioning signal, but a distinct representational factor that decouples the controllable change from embodiment-specific actuation. In this work, we propose SCAR, a joint inverse-forward dynamics framework for learning unified action representations across embodiments from visual transitions. Built on a pretrained generative backbone, SCAR uses an inverse dynamics model (IDM) to infer latent actions from latent observation pairs and a forward dynamics model (FDM) to predict future dynamics conditioned on them. To make the latent space transferable rather than a generic visual bottleneck, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
