The Temporal Trap: Entanglement in Pre-Trained Visual Representations for Visuomotor Policy Learning
Nikolaos Tsagkas, Andreas Sochopoulos, Duolikun Danier, Chris Xiaoxuan Lu, Oisin Mac Aodha

TL;DR
This paper investigates the challenge of temporal entanglement in pre-trained visual representations used for visuomotor policy learning, proposing a disentanglement baseline to improve temporal understanding and policy success.
Contribution
It identifies temporal entanglement as a key issue, quantifies its impact, and introduces a simple disentanglement method to enhance temporal representation in visuomotor tasks.
Findings
Temporal entanglement correlates with policy success.
Traditional temporal enrichment methods are insufficient.
Disentanglement improves temporal cue representation.
Abstract
The integration of pre-trained visual representations (PVRs) has significantly advanced visuomotor policy learning. However, effectively leveraging these models remains a challenge. We identify temporal entanglement as a critical, inherent issue when using these time-invariant models in sequential decision-making tasks. This entanglement arises because PVRs, optimised for static image understanding, struggle to represent the temporal dependencies crucial for visuomotor control. In this work, we quantify the impact of temporal entanglement, demonstrating a strong correlation between a policy's success rate and the ability of its latent space to capture task-progression cues. Based on these insights, we propose a simple, yet effective disentanglement baseline designed to mitigate temporal entanglement. Our empirical results show that traditional methods aimed at enriching features with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning
