Reinforcement Learning with Segment Feedback
Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

TL;DR
This paper introduces a new RL model called RL with segment feedback, analyzing how different feedback types and segment counts affect learning efficiency through theoretical bounds and experiments.
Contribution
It proposes the RL with segment feedback model, providing algorithms and regret bounds for binary and sum feedback settings, highlighting the impact of segment number on learning performance.
Findings
Increasing segments reduces regret exponentially under binary feedback.
Segment count has little effect on regret under sum feedback.
Theoretical and experimental validation of the model's behavior.
Abstract
Standard reinforcement learning (RL) assumes that an agent can observe a reward for each state-action pair. However, in practical applications, it is often difficult and costly to collect a reward for each state-action pair. While there have been several works considering RL with trajectory feedback, it is unclear if trajectory feedback is inefficient for learning when trajectories are long. In this work, we consider a model named RL with segment feedback, which offers a general paradigm filling the gap between per-state-action feedback and trajectory feedback. In this model, we consider an episodic Markov decision process (MDP), where each episode is divided into segments, and the agent observes reward feedback only at the end of each segment. Under this model, we study two popular feedback settings: binary feedback and sum feedback, where the agent observes a binary outcome and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Elevator Systems and Control
