Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL
Eduardo Pignatelli, Johan Ferret, Tim Rock\"aschel, Edward, Grefenstette, Davide Paglieri, Samuel Coward, Laura Toni

TL;DR
This paper explores using Large Language Models to automate credit assignment in Reinforcement Learning, aiming to improve learning from sparse, delayed rewards without manual domain knowledge or fine-tuning.
Contribution
It introduces CALM, a novel method leveraging LLMs to decompose tasks and assess subgoal achievement, enhancing RL with minimal human intervention.
Findings
LLMs can effectively assign credit in zero-shot RL settings.
CALM improves learning efficiency with sparse, delayed rewards.
Preliminary results show promise in using LLMs for reward shaping.
Abstract
The temporal credit assignment problem is a central challenge in Reinforcement Learning (RL), concerned with attributing the appropriate influence to each actions in a trajectory for their ability to achieve a goal. However, when feedback is delayed and sparse, the learning signal is poor, and action evaluation becomes harder. Canonical solutions, such as reward shaping and options, require extensive domain knowledge and manual intervention, limiting their scalability and applicability. In this work, we lay the foundations for Credit Assignment with Language Models (CALM), a novel approach that leverages Large Language Models (LLMs) to automate credit assignment via reward shaping and options discovery. CALM uses LLMs to decompose a task into elementary subgoals and assess the achievement of these subgoals in state-action transitions. Every time an option terminates, a subgoal is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSafety Systems Engineering in Autonomy · Information and Cyber Security · Software Reliability and Analysis Research
