Off-Policy Selection for Initiating Human-Centric Experimental Design
Ge Gao, Xi Yang, Qitong Gao, Song Ju, Miroslav Pajic, Min Chi

TL;DR
This paper introduces FPS, a novel off-policy selection method that segments participants into sub-groups to personalize policy deployment in human-centric systems, improving outcomes in education and healthcare.
Contribution
The work presents FPS, a new approach that addresses participant heterogeneity in off-policy selection for human-centric tasks, enabling personalized policy deployment without prior offline data.
Findings
FPS improves learning outcomes in intelligent tutoring systems.
FPS enhances healthcare intervention effectiveness for sepsis.
Participant segmentation leads to better policy alignment and results.
Abstract
In human-centric tasks such as healthcare and education, the heterogeneity among patients and students necessitates personalized treatments and instructional interventions. While reinforcement learning (RL) has been utilized in those tasks, off-policy selection (OPS) is pivotal to close the loop by offline evaluating and selecting policies without online interactions, yet current OPS methods often overlook the heterogeneity among participants. Our work is centered on resolving a pivotal challenge in human-centric systems (HCSs): how to select a policy to deploy when a new participant joining the cohort, without having access to any prior offline data collected over the participant? We introduce First-Glance Off-Policy Selection (FPS), a novel approach that systematically addresses participant heterogeneity through sub-group segmentation and tailored OPS criteria to each sub-group. By…
Peer Reviews
Decision·NeurIPS 2024 poster
- **Novel Approach**: The introduction of the First-Glance Off-Policy Selection (FPS) framework is a significant innovation. By systematically addressing participant heterogeneity through sub-group segmentation, FPS offers a fresh perspective on OPS in human-centric systems (HCSs). - **New Problem Formulation**: The paper tackles the unique challenge of selecting policies for new participants without prior offline data. This problem formulation is distinct from existing OPS/OPE frameworks, which
The paper is generally well-written. I will combine the weaknesses and questions into one section. 1. Assumption of Independent Initial State Distributions The FPS framework assumes that the initial state distributions for each participant are independent and can be uniformly sampled from the offline dataset. This assumption may not hold true in real-world scenarios where participants’ initial states can be influenced by various contextual factors and past interactions. The independence assump
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsErgonomics and Human Factors · Human-Automation Interaction and Safety · Systems Engineering Methodologies and Applications
