Causal prompting model-based offline reinforcement learning
Xuehui Yu, Yi Guan, Rujia Shen, Xin Li, Chen Tang, Jingchi Jiang

TL;DR
This paper introduces CPRL, a novel framework for offline reinforcement learning that effectively handles noisy, diverse datasets and generalizes across tasks using causal prompts and skill reuse, demonstrated on real-world medical data.
Contribution
The paper proposes the CPRL framework with Hip-BCPD for modeling dynamics and a skill-reuse strategy, advancing robustness and generalization in offline RL for online systems.
Findings
Outperforms existing algorithms in noisy, out-of-distribution environments
Effectively models environmental dynamics with Hip-BCPD
Enables multi-task learning through skill reuse
Abstract
Model-based offline Reinforcement Learning (RL) allows agents to fully utilise pre-collected datasets without requiring additional or unethical explorations. However, applying model-based offline RL to online systems presents challenges, primarily due to the highly suboptimal (noise-filled) and diverse nature of datasets generated by online systems. To tackle these issues, we introduce the Causal Prompting Reinforcement Learning (CPRL) framework, designed for highly suboptimal and resource-constrained online scenarios. The initial phase of CPRL involves the introduction of the Hidden-Parameter Block Causal Prompting Dynamic (Hip-BCPD) to model environmental dynamics. This approach utilises invariant causal prompts and aligns hidden parameters to generalise to new and diverse online users. In the subsequent phase, a single policy is trained to address multiple tasks through the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
