REX: Rapid Exploration and eXploitation for AI Agents

Rithesh Murthy; Shelby Heinecke; Juan Carlos Niebles; Zhiwei Liu; Le; Xue; Weiran Yao; Yihao Feng; Zeyuan Chen; Akash Gokul; Devansh Arpit; Ran Xu,; Phil Mui; Huan Wang; Caiming Xiong; Silvio Savarese

arXiv:2307.08962·cs.AI·January 30, 2024·2 cites

REX: Rapid Exploration and eXploitation for AI Agents

Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le, Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu,, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

PDF

Open Access 3 Reviews

TL;DR

REX introduces a reward-based framework for AI agents that improves exploration and exploitation efficiency, surpassing some existing methods in performance and significantly reducing execution time without requiring model fine-tuning.

Contribution

The paper presents REX, a novel approach that integrates reward mechanisms and UCB-like scores into AI agents, enabling more robust, efficient, and fine-tuning-free exploration and exploitation.

Findings

01

REX achieves comparable or better performance than Chain-of-Thoughts and RAP.

02

REX significantly reduces execution time of AI agents.

03

REX effectively leverages offline logs and integrates with foundation models.

Abstract

In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP), REX-based methods demonstrate…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

- Important and timely topic - Nice connections between RL/bandit literature and LLM - I agree with the authors that MCTS is a promising approach to combine with LLM; hence, the authors try to make progress in a good direction.

Weaknesses

- Regarding experiments, I have several concerns. First, I consider that the authors should compare of the performance of their LLM-based approaches with SOTA methods that are not based on LLMs. Second, the authors state that "RAP is not compatible with OpenAI APIs; therefore, we have exclusively focused on evaluating the time complexity of RAP, without delving into its accuracy", but I do not think that this is a good reason why the authors do not have to empirically evaluate RAP. If the author

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

Overall the paper is well motivated and well written. Proposed mechanisms such as integration of rewards into the prompt are novel.

Weaknesses

The approach seems to be limited to problems with discrete actions, and assumes the scores can be mapped to HIGH/LOW which may not always be possible.

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

# originality Combining search techniques with LLMs is an emerging area. MCTS is only recently being studied in this context, making the work original in that respect. # quality The proposed algorithms are evaluated on two different domains with varying size and reasoning requirements. Both accuracy and time complexity are compared. # clarity The figures and algorithm listings help clarify the core algorithms. # significance Enabling LLMs to incorporate feedback into reasoning is a crucial ca

Weaknesses

# originality The MCTS formulation is similar to RAP in some ways, but differs in many key details, particularly around how the world modeling is done, where RAP explicitly models the world but REX uses the implicit model of the LLM. Augmenting LLMs with search techniques is not itself novel, but this is not a substantial lack of originality per se either. # quality Evaluation results only weakly support the REX models. The performance improvements are primarily for (some) small Blocksworld dom

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Scientific Computing and Data Management