CueLearner: Bootstrapping and local policy adaptation from relative feedback

Giulio Schiavi; Andrei Cramariuc; Lionel Ott; Roland Siegwart

arXiv:2507.04730·cs.RO·July 8, 2025

CueLearner: Bootstrapping and local policy adaptation from relative feedback

Giulio Schiavi, Andrei Cramariuc, Lionel Ott, Roland Siegwart

PDF

TL;DR

CueLearner leverages relative human feedback to improve reinforcement learning efficiency and adaptability, demonstrating effectiveness in sparse-reward tasks and real-world navigation scenarios.

Contribution

Introduces a novel method combining relative feedback with off-policy RL, enhancing sample efficiency and policy adaptation in sparse-reward environments.

Findings

01

Improves sample efficiency in sparse-reward tasks.

02

Enables policy adaptation to environmental or user preference changes.

03

Successfully applied to real-world navigation policy learning.

Abstract

Human guidance has emerged as a powerful tool for enhancing reinforcement learning (RL). However, conventional forms of guidance such as demonstrations or binary scalar feedback can be challenging to collect or have low information content, motivating the exploration of other forms of human input. Among these, relative feedback (i.e., feedback on how to improve an action, such as "more to the left") offers a good balance between usability and information richness. Previous research has shown that relative feedback can be used to enhance policy search methods. However, these efforts have been limited to specific policy classes and use feedback inefficiently. In this work, we introduce a novel method to learn from relative feedback and combine it with off-policy reinforcement learning. Through evaluations on two sparse-reward tasks, we demonstrate our method can be used to improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.