TL;DR
This paper introduces a novel operator-theoretic framework for offline reinforcement learning that captures the directed temporal geometry of controlled Markov processes through hitting time observations, enabling robust multi-stage planning.
Contribution
It develops a new representation learning method that models hitting times as linear functionals in a Hilbert space, with theoretical guarantees and a practical algorithm called IEL for long-horizon navigation.
Findings
IEL improves state-of-the-art in offline maze navigation tasks.
The framework provides finite-sample guarantees and bounds on approximation errors.
The learned geometry reflects actual decision-time progress, enhancing multi-stage planning.
Abstract
We present a new operator-theoretic representation learning framework for offline reinforcement learning that recovers the directed temporal geometry of a controlled Markov process from hitting time observations. While prior art often produces symmetric distances or fails to satisfy the triangle inequality, our framework learns a Hilbert-space displacement geometry where expected hitting times are realized as linear functionals of latent displacements. We prove that this representation exists under latent linear closure and is uniquely identifiable up to a bounded linear isomorphism. For finite-dimensional implementations, we show that global hitting-time error is bounded by one-step transition error amplified by the environment's transient spectral radius. Furthermore, we provide finite-sample guarantees accounting for approximation, statistical complexity, and trajectory-label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
