Ground-Compose-Reinforce: Grounding Language in Agentic Behaviours using Limited Data

Andrew C. Li; Toryn Q. Klassen; Andrew Wang; Parand A. Alamdari; Sheila A. McIlraith

arXiv:2507.10741·cs.LG·October 28, 2025

Ground-Compose-Reinforce: Grounding Language in Agentic Behaviours using Limited Data

Andrew C. Li, Toryn Q. Klassen, Andrew Wang, Parand A. Alamdari, Sheila A. McIlraith

PDF

Open Access 1 Video

TL;DR

Ground-Compose-Reinforce is a neurosymbolic framework that enables training RL agents from high-level task specifications using limited data, leveraging compositional Reward Machines to ground language in agentic behaviors.

Contribution

It introduces a novel end-to-end approach that uses Reward Machines for language grounding, allowing complex behaviors to be learned with minimal data without manual reward design.

Findings

01

Successfully trained agents with only 350 trajectories

02

Achieved complex behaviors not seen in pretraining

03

Outperformed non-compositional methods

Abstract

Grounding language in perception and action is a key challenge when building situated agents that can interact with humans, or other agents, via language. In the past, addressing this challenge has required manually designing the language grounding or curating massive datasets that associate language with the environment. We propose Ground-Compose-Reinforce, an end-to-end, neurosymbolic framework for training RL agents directly from high-level task specifications--without manually designed reward functions or other domain-specific oracles, and without massive datasets. These task specifications take the form of Reward Machines, automata-based representations that capture high-level task structure and are in some cases autoformalizable from natural language. Critically, we show that Reward Machines can be grounded using limited data by exploiting compositionality. Experiments in a custom…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Ground-Compose-Reinforce: Grounding Language in Agentic Behaviours using Limited Data· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics