Optimizing Agent Behavior over Long Time Scales by Transporting Value

Chia-Chun Hung; Timothy Lillicrap; Josh Abramson; Yan Wu; Mehdi Mirza,; Federico Carnevale; Arun Ahuja; Greg Wayne

arXiv:1810.06721·cs.AI·December 24, 2018

Optimizing Agent Behavior over Long Time Scales by Transporting Value

Chia-Chun Hung, Timothy Lillicrap, Josh Abramson, Yan Wu, Mehdi Mirza,, Federico Carnevale, Arun Ahuja, Greg Wayne

PDF

TL;DR

This paper introduces a new reinforcement learning paradigm that uses memory recall to assign credit over long time spans, enabling agents to solve problems with delayed rewards and providing insights into long-term decision-making.

Contribution

The authors propose a novel approach for long-term credit assignment in AI by leveraging memory recall, addressing limitations of existing methods for long-delay tasks.

Findings

01

Enables agents to solve long-delay credit assignment problems.

02

Broadens AI research scope to long-term temporal dependencies.

03

Provides a mechanistic framework inspired by human memory and decision-making.

Abstract

Humans spend a remarkable fraction of waking life engaged in acts of "mental time travel". We dwell on our actions in the past and experience satisfaction or regret. More than merely autobiographical storytelling, we use these event recollections to change how we will act in similar scenarios in the future. This process endows us with a computationally important ability to link actions and consequences across long spans of time, which figures prominently in addressing the problem of long-term temporal credit assignment; in artificial intelligence (AI) this is the question of how to evaluate the utility of the actions within a long-duration behavioral sequence leading to success or failure in a task. Existing approaches to shorter-term credit assignment in AI cannot solve tasks with long delays between actions and consequences. Here, we introduce a new paradigm for reinforcement learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.