Sliding Window Codes: Near-Optimality and Q-Learning for Zero-Delay Coding
Liam Cregg, Fady Alajaji, Serdar Yuksel

TL;DR
This paper introduces a reinforcement learning approach for zero-delay coding of Markov sources, using a finite window approximation of the belief MDP to achieve near-optimal performance with provable convergence.
Contribution
It proposes a finite window approximation of the belief MDP and a Q-learning algorithm that converges to near-optimal policies for zero-delay coding.
Findings
Finite window policies are near-optimal as window length increases.
The proposed Q-learning algorithm converges to near-optimal policies.
Comparison shows advantages of the sliding window scheme over nearest neighbor quantization.
Abstract
We study the problem of zero-delay coding for the transmission of a Markov source over a noisy channel with feedback and present a reinforcement learning solution which is guaranteed to achieve near-optimality. To this end, we formulate the problem as a Markov decision process (MDP) where the state is a probability-measure valued predictor/belief and the actions are quantizer maps. This MDP formulation has been used to show the optimality of certain classes of encoder policies in prior work, but their computation is prohibitively complex due to the uncountable nature of the constructed state space and the lack of minorization or strong ergodicity results. These challenges invite rigorous reinforcement learning methods, which entail several open questions: can we approximate this MDP with a finite-state one with some performance guarantee? Can we ensure convergence of a reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Communication Security Techniques · Reinforcement Learning in Robotics
