Guided Policy Search for Parameterized Skills using Adverbs
Benjamin A. Spiegel, George Konidaris

TL;DR
This paper introduces a method that uses human-provided adverb feedback to efficiently adjust skill parameters in robotic policies, especially when environmental rewards are sparse or unavailable.
Contribution
The paper proposes a novel approach that leverages adverb phrases for skill adjustment, enabling direct human feedback integration into policy updates.
Findings
Improved sample efficiency over existing policy search methods.
Effective use of adverb feedback for skill parameter tuning.
Applicable as a drop-in replacement for traditional policy search.
Abstract
We present a method for using adverb phrases to adjust skill parameters via learned adverb-skill groundings. These groundings allow an agent to use adverb feedback provided by a human to directly update a skill policy, in a manner similar to traditional local policy search methods. We show that our method can be used as a drop-in replacement for these policy search methods when dense reward from the environment is not available but human language feedback is. We demonstrate improved sample efficiency over modern policy search methods in two experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
