Comparing PCG metrics with Human Evaluation in Minecraft Settlement Generation
Jean-Baptiste Herv\'e, Christoph Salge

TL;DR
This study evaluates how well existing and new procedural content generation metrics for Minecraft settlements align with human judgments, analyzing their effectiveness and generalization across complex artifacts.
Contribution
It adapts and develops PCG metrics for Minecraft, compares them with human evaluations, and explores their applicability to complex artifacts and other game domains.
Findings
Metrics related to element counts correlate with human scores.
Diversity and presence of crafting materials influence human evaluations.
Some metrics effectively predict human preferences in Minecraft settlements.
Abstract
There are a range of metrics that can be applied to the artifacts produced by procedural content generation, and several of them come with qualitative claims. In this paper, we adapt a range of existing PCG metrics to generated Minecraft settlements, develop a few new metrics inspired by PCG literature, and compare the resulting measurements to existing human evaluations. The aim is to analyze how those metrics capture human evaluation scores in different categories, how the metrics generalize to another game domain, and how metrics deal with more complex artifacts. We provide an exploratory look at a variety of metrics and provide an information gain and several correlation analyses. We found some relationships between human scores and metrics counting specific elements, measuring the diversity of blocks and measuring the presence of crafting materials for the present complex blocks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
