Learning the Arrow of Time
Nasim Rahaman, Steffen Wolf, Anirudh Goyal, Roman Remme, Yoshua Bengio

TL;DR
This paper explores learning an arrow of time within Markov Decision Processes, demonstrating its ability to encode environmental information useful for reachability, side-effect detection, and intrinsic rewards, with empirical validation across various environments.
Contribution
It introduces a method to learn the arrow of time in Markov processes, linking it to known theoretical concepts and showing practical benefits.
Findings
Learned arrow of time aligns with theoretical expectations.
Enables detection of side-effects and measurement of reachability.
Improves intrinsic reward signals in reinforcement learning.
Abstract
We humans seem to have an innate understanding of the asymmetric progression of time, which we use to efficiently and safely perceive and manipulate our environment. Drawing inspiration from that, we address the problem of learning an arrow of time in a Markov (Decision) Process. We illustrate how a learned arrow of time can capture meaningful information about the environment, which in turn can be used to measure reachability, detect side-effects and to obtain an intrinsic reward signal. We show empirical results on a selection of discrete and continuous environments, and demonstrate for a class of stochastic processes that the learned arrow of time agrees reasonably well with a known notion of an arrow of time given by the celebrated Jordan-Kinderlehrer-Otto result.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural dynamics and brain function · Complex Systems and Decision Making · Gene Regulatory Network Analysis
