Learning Purposeful Behaviour in the Absence of Rewards

Marlos C. Machado; Michael Bowling

arXiv:1605.07700·cs.LG·May 26, 2016·24 cites

Learning Purposeful Behaviour in the Absence of Rewards

Marlos C. Machado, Michael Bowling

PDF

Open Access

TL;DR

This paper introduces an algorithm that enables agents to learn purposeful and exploratory behaviors in environments lacking reward signals by identifying intrinsic goals and constructing temporally extended actions.

Contribution

The paper presents a novel method for learning purposeful behavior without rewards by identifying purposes as intrinsic goals and building options to facilitate exploration.

Findings

01

Enables purposeful exploration without reward signals

02

Constructs temporally extended actions (options) based on intrinsic purposes

03

Improves exploration in sparse reward environments

Abstract

Artificial intelligence is commonly defined as the ability to achieve goals in the world. In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress. However, some domains have no such reward signal, or have a reward signal so sparse as to appear absent. Without reward feedback, agent behaviour is typically random, often dithering aimlessly and lacking intentionality. In this paper we present an algorithm capable of learning purposeful behaviour in the absence of rewards. The algorithm proceeds by constructing temporally extended actions (options), through the identification of purposes that are "just out of reach" of the agent's current behaviour. These purposes establish intrinsic goals for the agent to learn, ultimately resulting in a suite of behaviours that encourage the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research