Guided Policy Search for Parameterized Skills using Adverbs

Benjamin A. Spiegel; George Konidaris

arXiv:2110.15799·cs.AI·November 1, 2021

Guided Policy Search for Parameterized Skills using Adverbs

Benjamin A. Spiegel, George Konidaris

PDF

Open Access

TL;DR

This paper introduces a method that uses human-provided adverb feedback to efficiently adjust skill parameters in robotic policies, especially when environmental rewards are sparse or unavailable.

Contribution

The paper proposes a novel approach that leverages adverb phrases for skill adjustment, enabling direct human feedback integration into policy updates.

Findings

01

Improved sample efficiency over existing policy search methods.

02

Effective use of adverb feedback for skill parameter tuning.

03

Applicable as a drop-in replacement for traditional policy search.

Abstract

We present a method for using adverb phrases to adjust skill parameters via learned adverb-skill groundings. These groundings allow an agent to use adverb feedback provided by a human to directly update a skill policy, in a manner similar to traditional local policy search methods. We show that our method can be used as a drop-in replacement for these policy search methods when dense reward from the environment is not available but human language feedback is. We demonstrate improved sample efficiency over modern policy search methods in two experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications