Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC
Aditya Soni, Mayukh Das, Anjaly Parayil, Supriyo Ghosh, Shivam, Shandilya, Ching-An Cheng, Vishak Gopal, Sami Khairy, Gabriel Mittag, Yasaman, Hosseinkashi, Chetan Bansal

TL;DR
This paper introduces Streetwise, a method to enhance offline RL policies by dynamically adjusting actions based on real-time detection of exogenous disturbances, improving robustness in real-world scenarios like RTC.
Contribution
We propose a novel post-deployment policy shaping approach that accounts for unseen exogenous factors, addressing a key challenge in offline RL deployment.
Findings
Achieved approximately 18% improvement in final returns over baselines.
Demonstrated robustness in bandwidth estimation for RTC.
Validated effectiveness on standard offline RL benchmarks.
Abstract
The difficulty of exploring and training online on real production systems limits the scope of real-time online data/feedback-driven decision making. The most feasible approach is to adopt offline reinforcement learning from limited trajectory samples. However, after deployment, such policies fail due to exogenous factors that temporarily or permanently disturb/alter the transition distribution of the assumed decision process structure induced by offline samples. This results in critical policy failures and generalization errors in sensitive domains like Real-Time Communication (RTC). We solve this crucial problem of identifying robust actions in presence of domain shifts due to unseen exogenous stochastic factors in the wild. As it is impossible to learn generalized offline policies within the support of offline data that are robust to these unseen exogenous disturbances, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsADaptive gradient method with the OPTimal convergence rate
