AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents

Renxi Wang; Rifo Ahmad Genadi; Bilal El Bouardi; Yongxin Wang; Fajri Koto; Zhengzhong Liu; Timothy Baldwin; Haonan Li

arXiv:2507.14897·cs.AI·July 22, 2025

AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents

Renxi Wang, Rifo Ahmad Genadi, Bilal El Bouardi, Yongxin Wang, Fajri Koto, Zhengzhong Liu, Timothy Baldwin, Haonan Li

PDF

TL;DR

AgentFly is a scalable, extensible framework that integrates reinforcement learning with language model agents, enabling multi-turn interactions and high-throughput training for improved task performance.

Contribution

We introduce AgentFly, a novel framework that systematically combines RL algorithms with LM agents, supporting extensibility, scalability, and multi-task training.

Findings

01

Successful training of LM agents across multiple tasks

02

Framework supports multi-turn interactions with token-level masking

03

High-throughput training with asynchronous execution

Abstract

Language model (LM) agents have gained significant attention for their ability to autonomously complete tasks through interactions with environments, tools, and APIs. LM agents are primarily built with prompt engineering or supervised finetuning. At the same time, reinforcement learning (RL) has been explored to enhance LM's capabilities, such as reasoning and factuality. However, the combination of the LM agents and reinforcement learning (Agent-RL) remains underexplored and lacks systematic study. To this end, we built AgentFly, a scalable and extensible Agent-RL framework designed to empower LM agents with a variety of RL algorithms. Our framework supports multi-turn interactions by adapting traditional RL methods with token-level masking. It features a decorator-based interface for defining tools and reward functions, enabling seamless extension and ease of use. To support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.