Crossmodal Attentive Skill Learner
Shayegan Omidshafiei, Dong-Ki Kim, Jason Pazis, Jonathan P. How

TL;DR
This paper introduces CASL, a hierarchical reinforcement learning framework that leverages crossmodal attention to improve task performance and transfer across multiple sensory inputs, demonstrated in Atari 2600 games.
Contribution
The paper presents CASL, integrating crossmodal attention with A2OC for enhanced hierarchical reinforcement learning across multiple sensory modalities, including audio and visual inputs.
Findings
Improved performance in single tasks.
Accelerated transfer to new tasks.
Effective filtering of irrelevant sensor data.
Abstract
This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic (A2OC) architecture [Harb et al., 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. We provide concrete examples where the approach not only improves performance in a single task, but accelerates transfer to new tasks. We demonstrate the attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. We modify the Arcade Learning Environment [Bellemare et al., 2013] to support audio queries, and conduct evaluations of crossmodal learning in the Atari 2600 game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017], we open-source a fast hybrid CPU-GPU implementation of CASL.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Human Pose and Action Recognition
