Pavlovian Signalling with General Value Functions in Agent-Agent   Temporal Decision Making

Andrew Butcher; Michael Bradley Johanson; Elnaz Davoodi; Dylan J. A.; Brenneis; Leslie Acker; Adam S. R. Parker; Adam White; Joseph Modayil,; Patrick M. Pilarski

arXiv:2201.03709·cs.AI·January 12, 2022·1 cites

Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making

Andrew Butcher, Michael Bradley Johanson, Elnaz Davoodi, Dylan J. A., Brenneis, Leslie Acker, Adam S. R. Parker, Adam White, Joseph Modayil,, Patrick M. Pilarski

PDF

Open Access

TL;DR

This paper explores Pavlovian signalling as a mechanism for adaptive communication between learning agents, demonstrating its effectiveness in a novel decision-making domain and analyzing its impact on coordination and timing.

Contribution

It introduces Pavlovian signalling as a bridge between fixed signals and adaptive communication, showing how to build it from prediction learning with minimal constraints.

Findings

01

Pavlovian signalling accelerates learning in agent interactions.

02

Temporal representations influence coordination but not the speed of learning.

03

Temporal aliasing affects human-agent and agent-agent interactions differently.

Abstract

In this paper, we contribute a multi-faceted study into Pavlovian signalling -- a process by which learned, temporally extended predictions made by one agent inform decision-making by another agent. Signalling is intimately connected to time and timing. In service of generating and receiving signals, humans and other animals are known to represent time, determine time since past events, predict the time until a future stimulus, and both recognize and generate patterns that unfold in time. We investigate how different temporal processes impact coordination and signalling between learning agents by introducing a partially observable decision-making domain we call the Frost Hollow. In this domain, a prediction learning agent and a reinforcement learning agent are coupled into a two-part decision-making system that works to acquire sparse reward while avoiding time-conditional hazards. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing

Methodstravel james · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings