Inference-Time Policy Steering through Human Interactions

Yanwei Wang; Lirui Wang; Yilun Du; Balakumar Sundaralingam; Xuning; Yang; Yu-Wei Chao; Claudia Perez-D'Arpino; Dieter Fox; Julie Shah

arXiv:2411.16627·cs.RO·March 27, 2025

Inference-Time Policy Steering through Human Interactions

Yanwei Wang, Lirui Wang, Yilun Du, Balakumar Sundaralingam, Xuning, Yang, Yu-Wei Chao, Claudia Perez-D'Arpino, Dieter Fox, Julie Shah

PDF

Open Access

TL;DR

This paper introduces an Inference-Time Policy Steering framework that uses human interactions to guide generative policies during inference, improving alignment with human intent without retraining.

Contribution

The paper proposes a novel inference-time steering method that biases sampling with human input, avoiding distribution shift issues common in fine-tuning approaches.

Findings

01

Diffusion policy sampling achieves best alignment-shift trade-off.

02

ITPS improves policy alignment with human goals during inference.

03

Method validated on multiple simulated and real-world benchmarks.

Abstract

Generative policies trained with human demonstrations can autonomously accomplish multimodal, long-horizon tasks. However, during inference, humans are often removed from the policy execution loop, limiting the ability to guide a pre-trained policy towards a specific sub-goal or trajectory shape among multiple predictions. Naive human intervention may inadvertently exacerbate distribution shift, leading to constraint violations or execution failures. To better align policy output with human intent without inducing out-of-distribution errors, we propose an Inference-Time Policy Steering (ITPS) framework that leverages human interactions to bias the generative sampling process, rather than fine-tuning the policy on interaction data. We evaluate ITPS across three simulated and real-world benchmarks, testing three forms of human interaction and associated alignment distance metrics. Among…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making · Game Theory and Applications

MethodsDiffusion · ALIGN