OAgents: An Empirical Study of Building Effective Agents

He Zhu; Tianrui Qin; King Zhu; Heyuan Huang; Yeyi Guan; Jinxiang Xia; Yi Yao; Hanhao Li; Ningning Wang; Pai Liu; Tianhao Peng; Xin Gui; Xiaowan Li; Yuhui Liu; Yuchen Eleanor Jiang; Jun Wang; Changwang Zhang; Xiangru Tang; Ge Zhang; Jian Yang; Minghao Liu; Xitong Gao; Jiaheng Liu; Wangchunshu Zhou

arXiv:2506.15741·cs.AI·June 24, 2025

OAgents: An Empirical Study of Building Effective Agents

He Zhu, Tianrui Qin, King Zhu, Heyuan Huang, Yeyi Guan, Jinxiang Xia, Yi Yao, Hanhao Li, Ningning Wang, Pai Liu, Tianhao Peng, Xin Gui, Xiaowan Li, Yuhui Liu, Yuchen Eleanor Jiang, Jun Wang, Changwang Zhang, Xiangru Tang, Ge Zhang, Jian Yang, Minghao Liu, Xitong Gao, Jiaheng Liu

PDF

Open Access 1 Video

TL;DR

This paper conducts a systematic empirical study on agent design choices, introduces a robust evaluation protocol, and presents OAgents, a modular framework that achieves state-of-the-art performance in Agentic AI research.

Contribution

It provides a standardized evaluation protocol, analyzes the impact of design choices, and introduces OAgents, an open-source, high-performing agent framework.

Findings

01

Certain agent components are crucial for effectiveness.

02

Standardized evaluation improves reproducibility.

03

OAgents achieves state-of-the-art performance.

Abstract

Recently, Agentic AI has become an increasingly popular research field. However, we argue that current agent research practices lack standardization and scientific rigor, making it hard to conduct fair comparisons among methods. As a result, it is still unclear how different design choices in agent frameworks affect effectiveness, and measuring their progress remains challenging. In this work, we conduct a systematic empirical study on GAIA benchmark and BrowseComp to examine the impact of popular design choices in key agent components in a fair and rigorous manner. We find that the lack of a standard evaluation protocol makes previous works, even open-sourced ones, non-reproducible, with significant variance between random runs. Therefore, we introduce a more robust evaluation protocol to stabilize comparisons. Our study reveals which components and designs are crucial for effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

OAgents: An Empirical Study of Building Effective Agents· underline

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Mobile Crowdsensing and Crowdsourcing · Reinforcement Learning in Robotics