Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation?
Ziqi Ma, Sao Mai Nguyen, Philippe Xu

TL;DR
This paper explores whether large language models can convert human instructions into the internal symbolic representations used by reinforcement learning agents, revealing partial success and current limitations.
Contribution
It introduces a structured evaluation framework to assess LLMs' ability to translate natural language into internal symbolic representations in RL environments.
Findings
LLMs can partially translate natural language into environment dynamics symbols
Performance varies with partition granularity and task complexity
Current LLMs show limitations in aligning language with internal agent representations
Abstract
Emergent symbolic representations are critical for enabling developmental learning agents to plan and generalize across tasks. In this work, we investigate whether large language models (LLMs) can translate human natural language instructions into the internal symbolic representations that emerge during hierarchical reinforcement learning. We apply a structured evaluation framework to measure the translation performance of commonly seen LLMs -- GPT, Claude, Deepseek and Grok -- across different internal symbolic partitions generated by a hierarchical reinforcement learning algorithm in the Ant Maze and Ant Fall environments. Our findings reveal that although LLMs demonstrate some ability to translate natural language into a symbolic representation of the environment dynamics, their performance is highly sensitive to partition granularity and task complexity. The results expose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
