User-Oriented Robust Reinforcement Learning
Haoyi You, Beichen Yu, Haiming Jin, Zhaoxing Yang, Jiahui Sun

TL;DR
This paper introduces a user-oriented robust reinforcement learning framework that incorporates user preferences into policy optimization, balancing robustness and personalization, and demonstrates superior performance in MuJoCo tasks.
Contribution
It proposes a novel UOR metric for RL, develops algorithms for different environment distribution knowledge scenarios, and proves their convergence to near-optimal policies.
Findings
UOR-RL achieves state-of-the-art results under the UOR metric.
UOR-RL performs comparably to baselines on average and worst-case metrics.
Theoretical convergence guarantees are provided for the proposed algorithms.
Abstract
Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy's performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Smart Parking Systems Research
