CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives

Armin Saghafian; Amirmohammad Izadi; Negin Hashemi Dijujin; Mahdieh Soleymani Baghshah

arXiv:2411.19787·cs.LG·September 9, 2025

CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives

Armin Saghafian, Amirmohammad Izadi, Negin Hashemi Dijujin, Mahdieh Soleymani Baghshah

PDF

Open Access 1 Repo

TL;DR

CAREL introduces a novel framework for instruction-guided reinforcement learning that leverages cross-modal auxiliary objectives and instruction tracking to improve generalization and sample efficiency in multi-modal environments.

Contribution

The paper presents CAREL, a new approach combining auxiliary loss functions and instruction tracking to enhance instruction grounding and generalization in reinforcement learning.

Findings

01

Superior sample efficiency demonstrated in experiments

02

Enhanced systematic generalization across tasks

03

Effective use of auxiliary objectives inspired by video-text retrieval

Abstract

Grounding the instruction in the environment is a key step in solving language-guided goal-reaching reinforcement learning problems. In automated reinforcement learning, a key concern is to enhance the model's ability to generalize across various tasks and environments. In goal-reaching scenarios, the agent must comprehend the different parts of the instructions within the environmental context in order to complete the overall task successfully. In this work, we propose CAREL (Cross-modal Auxiliary REinforcement Learning) as a new framework to solve this problem using auxiliary loss functions inspired by video-text retrieval literature and a novel method called instruction tracking, which automatically keeps track of progress in an environment. The results of our experiments suggest superior sample efficiency and systematic generalization for this framework in multi-modal reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ArminS03/CAREL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Modular Robots and Swarm Intelligence

MethodsBalanced Selection