Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous   Stochastic Disturbances in RTC

Aditya Soni; Mayukh Das; Anjaly Parayil; Supriyo Ghosh; Shivam; Shandilya; Ching-An Cheng; Vishak Gopal; Sami Khairy; Gabriel Mittag; Yasaman; Hosseinkashi; Chetan Bansal

arXiv:2411.06815·cs.LG·November 12, 2024

Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC

Aditya Soni, Mayukh Das, Anjaly Parayil, Supriyo Ghosh, Shivam, Shandilya, Ching-An Cheng, Vishak Gopal, Sami Khairy, Gabriel Mittag, Yasaman, Hosseinkashi, Chetan Bansal

PDF

Open Access

TL;DR

This paper introduces Streetwise, a method to enhance offline RL policies by dynamically adjusting actions based on real-time detection of exogenous disturbances, improving robustness in real-world scenarios like RTC.

Contribution

We propose a novel post-deployment policy shaping approach that accounts for unseen exogenous factors, addressing a key challenge in offline RL deployment.

Findings

01

Achieved approximately 18% improvement in final returns over baselines.

02

Demonstrated robustness in bandwidth estimation for RTC.

03

Validated effectiveness on standard offline RL benchmarks.

Abstract

The difficulty of exploring and training online on real production systems limits the scope of real-time online data/feedback-driven decision making. The most feasible approach is to adopt offline reinforcement learning from limited trajectory samples. However, after deployment, such policies fail due to exogenous factors that temporarily or permanently disturb/alter the transition distribution of the assumed decision process structure induced by offline samples. This results in critical policy failures and generalization errors in sensitive domains like Real-Time Communication (RTC). We solve this crucial problem of identifying robust actions in presence of domain shifts due to unseen exogenous stochastic factors in the wild. As it is impossible to learn generalized offline policies within the support of offline data that are robust to these unseen exogenous disturbances, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsADaptive gradient method with the OPTimal convergence rate