Multi-action Tangled Program Graphs for Multi-task Reinforcement Learning with Continuous Control

Quentin Vacher (IETR); Nicolas Beuve (IETR); Micka\"el Dardaillon (IETR); Karol Desnos (IETR)

arXiv:2604.25369·cs.AI·April 29, 2026

Multi-action Tangled Program Graphs for Multi-task Reinforcement Learning with Continuous Control

Quentin Vacher (IETR), Nicolas Beuve (IETR), Micka\"el Dardaillon (IETR), Karol Desnos (IETR)

PDF

TL;DR

This paper introduces Multi-Action TPG, a genetic programming approach for multi-task continuous reinforcement learning, validated on a new MuJoCo Half Cheetah benchmark with multiple obstacles, demonstrating superior performance and interpretability.

Contribution

The paper proposes Multi-Action TPG, extending TPG for continuous multi-task RL, and introduces a new benchmark to evaluate multi-task learning in continuous control environments.

Findings

01

MATPG achieves superior results with lexicase selection.

02

The new benchmark effectively tests multi-task RL capabilities.

03

The evolved graph's decision flow is fully interpretable.

Abstract

Over the past few decades, machine learning has been widely used to learn complex tasks. Reinforcement Learning (RL), inspired by human behavior, is a great example, as it involves developing specific behaviours for specific tasks. To further challenge algorithms, Multi-Task RL (MTRL) environments have been introduced, requiring a single model to learn multiple behaviors. The Tangled Program Graph (TPG) algorithm is a Genetic Programming (GP) algorithm designed for discrete MTRL environments. Recently, the MAPLE algorithm has been proposed, as another GP algorithm that achieves high results in single task continuous RL environments. A variation of the TPG is proposed alongside MAPLE, named Multi-Action TPG (MATPG) that aggregates MAPLE agents, and creates a control flow to activate them. Initially tested on single task RL environments only, MATPG achieved similar results to MAPLE. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.