Designing Skill-Compatible AI: Methodologies and Frameworks in Chess
Karim Hamade, Reid McIlroy-Young, Siddhartha Sen, Jon Kleinberg,, Ashton Anderson

TL;DR
This paper introduces methodologies and frameworks for designing AI agents in chess that are skill-compatible with lower-skill partners, enabling effective collaboration despite differences in skill levels.
Contribution
It presents three new methodologies and two chess frameworks to create and evaluate skill-compatible AI agents, outperforming traditional near-optimal engines in collaborative settings.
Findings
Skill-compatible agents outperform state-of-the-art AI in collaborative chess.
Traditional chess engines are inadequate for low-skill partner interaction.
Skill-compatibility is distinct from raw performance and can be measured.
Abstract
Powerful artificial intelligence systems are often used in settings where they must interact with agents that are computationally much weaker, for example when they work alongside humans or operate in complex environments where some tasks are handled by algorithms, heuristics, or other entities of varying computational power. For AI agents to successfully interact in these settings, however, achieving superhuman performance alone is not sufficient; they also need to account for suboptimal actions or idiosyncratic style from their less-skilled counterparts. We propose a formal evaluation framework for assessing the compatibility of near-optimal AI with interaction partners who may have much lower levels of skill; we use popular collaborative chess variants as model systems to study and develop AI agents that can successfully interact with lower-skill entities. Traditional chess engines…
Peer Reviews
Decision·ICLR 2024 poster
This is clearly an important problem and is, to my knowledge, quite understudied. This appears to be a natural set of approaches that should be tried, and the thorough empirical evaluation of these approaches is useful for the community. They also appear to be the first to have formalized this problem, and they have set up about as good a methodology for evaluating their method as one could hope for short of human trials.
Given that these approaches do not train from scratch, there is some concern that the local improvements may not be representative of the improvements you would hope for if the models were retrained. Section 4.4 should be emphasized more from the beginning. It seems the biggest challenge with this line of work is the model of their partner. It's not enough just to match the "skill level" of the human; but it is also important to match the style of the human and adapt to the patterns in human
- Innovative Concept: The paper introduces a novel and timely concept of "skill-compatibility" that addresses the real-world challenge of AI and human collaboration. - Empirical Evidence: The use of collaborative chess variants as model systems offers practical insights and empirical proof-of-concept. - Multiple Methodologies: The paper presents three distinct methodologies, showcasing the versatility of approaches to achieving skill-compatibility. - Comparative Analysis: By comparing newly pr
The main weakness is that the methods proposed don’t show a clear path toward broader human-AI collaboration. Since you use the weaker chess engine as a subroutine in search, it isn’t clear how you could make a version with humans since they can’t communicate at test time. Furthermore, these methods seem limited to chess. It would be more interesting to have methods that would work for a variety of cooperative or cooperative/competitive games such as Bridge or Hanabi. Minor: formatting is off
+The wide variety of chess bots available at different skill levels and variety of play styles at each skill level makes for a very interesting and realistic benchmark. I could see this having direct influence on human-AI teaming. +The motivation for this paper is very strong. It is very interesting to consider the advice mismatch between superhuman AIs and mere human players. This is an open problem. +Introduces a new version of interpretability: “interpretable iff a weaker agent can follow–u
-Although there are many comparisons between the lower-skilled agents and humans, this does not guarantee that the results will be the same with humans. It would be interesting to have a small experiment to confirm the results in this setting. -The description of the main results do not claim that there is a best guidance for what another researcher should try as the main takeaway
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Reinforcement Learning in Robotics
