TL;DR
This paper introduces a testbed based on the game 'Unciv' to evaluate large language models' ability to act as human-like agents in complex, decision-rich environments involving social interactions and strategic planning.
Contribution
It develops an open-source platform for studying human-like behavior of LLM-based agents in a strategic game with linguistic and social challenges.
Findings
Provides a new testbed for LLM agent evaluation in complex games
Highlights challenges in reasoning and social interaction for LLMs
Enables future research on human-like autonomous agents
Abstract
With the rapid advancement of Large Language Models (LLMs), LLM-based autonomous agents have shown the potential to function as digital employees, such as digital analysts, teachers, and programmers. In this paper, we develop an application-level testbed based on the open-source strategy game "Unciv", which has millions of active players, to enable researchers to build a "data flywheel" for studying human-like agents in the "digital players" task. This "Civilization"-like game features expansive decision-making spaces along with rich linguistic interactions such as diplomatic negotiations and acts of deception, posing significant challenges for LLM-based agents in terms of numerical reasoning and long-term planning. Another challenge for "digital players" is to generate human-like responses for social interaction, collaboration, and negotiation with human players. The open-source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
