Modeling Affect-based Intrinsic Rewards for Exploration and Learning

Dean Zadok; Daniel McDuff; Ashish Kapoor

arXiv:1912.00403·cs.CV·April 6, 2021

Modeling Affect-based Intrinsic Rewards for Exploration and Learning

Dean Zadok, Daniel McDuff, Ashish Kapoor

PDF

2 Repos

TL;DR

This paper introduces a novel intrinsic reward based on positive affect, specifically spontaneous smile behavior, to enhance exploration and learning efficiency in reinforcement learning, leading to faster progress in downstream computer vision tasks.

Contribution

A task-independent intrinsic reward function derived from positive affect is proposed, improving exploration and learning speed in reinforcement learning environments.

Findings

01

Increased episode duration and exploration area

02

Reduced collisions during training

03

Faster learning in downstream tasks

Abstract

Positive affect has been linked to increased interest, curiosity and satisfaction in human learning. In reinforcement learning, extrinsic rewards are often sparse and difficult to define, intrinsically motivated learning can help address these challenges. We argue that positive affect is an important intrinsic reward that effectively helps drive exploration that is useful in gathering experiences. We present a novel approach leveraging a task-independent reward function trained on spontaneous smile behavior that reflects the intrinsic reward of positive affect. To evaluate our approach we trained several downstream computer vision tasks on data collected with our policy and several baseline methods. We show that the policy based on our affective rewards successfully increases the duration of episodes, the area explored and reduces collisions. The impact is the increased speed of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings