Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents
LeCheng Zhang, Yuanshi Wang, Haotian Shen, Xujie Wang

TL;DR
This study compares Transformer, LLM, and PPO-based AI agents in mastering the complex deductive game Da Vinci Code, highlighting the superiority of reinforcement learning agents in strategic performance and reasoning capabilities.
Contribution
It introduces and evaluates three distinct AI architectures for Da Vinci Code, demonstrating the effectiveness of PPO-based reinforcement learning over LLMs and Transformers.
Findings
PPO-based agent achieved 58.5% win rate, outperforming LLMs.
Reinforcement learning enhances strategic reasoning in complex games.
LLMs face limitations in logical consistency over extended gameplay.
Abstract
The Da Vinci Code, a game of logical deduction and imperfect information, presents unique challenges for artificial intelligence, demanding nuanced reasoning beyond simple pattern recognition. This paper investigates the efficacy of various AI paradigms in mastering this game. We develop and evaluate three distinct agent architectures: a Transformer-based baseline model with limited historical context, several Large Language Model (LLM) agents (including Gemini, DeepSeek, and GPT variants) guided by structured prompts, and an agent based on Proximal Policy Optimization (PPO) employing a Transformer encoder for comprehensive game history processing. Performance is benchmarked against the baseline, with the PPO-based agent demonstrating superior win rates (), significantly outperforming the LLM counterparts. Our analysis highlights the strengths of deep reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms
