TL;DR
This paper introduces a set of new metrics and a comparison method for evaluating AI agents in General Video Game Playing, focusing on understanding decision-making processes beyond traditional performance measures.
Contribution
It proposes a general approach for shallow introspection of agent decision-making, applicable across different algorithms, and demonstrates its usefulness in analyzing MCTS-based agents.
Findings
Metrics reveal how different terms influence agent decisions
Comparison with baselines helps understand decision landscape
Approach enhances interpretability of AI agent behavior
Abstract
The General Video Game AI competitions have been the testing ground for several techniques for game playing, such as evolutionary computation techniques, tree search algorithms, hyper heuristic based or knowledge based algorithms. So far the metrics used to evaluate the performance of agents have been win ratio, game score and length of games. In this paper we provide a wider set of metrics and a comparison method for evaluating and comparing agents. The metrics and the comparison method give shallow introspection into the agent's decision making process and they can be applied to any agent regardless of its algorithmic nature. In this work, the metrics and the comparison method are used to measure the impact of the terms that compose a tree policy of an MCTS based agent, comparing with several baseline agents. The results clearly show how promising such general approach is and how it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
