TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning
Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe, Xu, Hal Daum\'e III, and Furong Huang

TL;DR
TACO introduces a temporal contrastive learning method that jointly learns state and action representations, significantly improving sample efficiency and performance in visual reinforcement learning tasks.
Contribution
The paper proposes TACO, a novel contrastive learning approach that captures control-relevant state and action representations, enhancing reinforcement learning in continuous control environments.
Findings
40% performance boost after one million steps in online RL
Sets new state-of-the-art in offline visual RL
Effective across diverse datasets and tasks
Abstract
Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle. Prior works have attempted to address this challenge by creating self-supervised auxiliary tasks, aiming to enrich the agent's learned representations with control-relevant information for future state prediction. However, these objectives are often insufficient to learn representations that can represent the optimal policy or value function, and they often consider tasks with small, abstract discrete action spaces and thus overlook the importance of action representation learning in continuous control. In this paper, we introduce TACO: Temporal Action-driven Contrastive Learning, a simple yet powerful temporal contrastive learning approach that facilitates the concurrent acquisition of latent state and action representations for agents. TACO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsContrastive Learning
