Synthesis and Evaluation of a Domain-specific Large Data Set for Dungeons & Dragons
Akila Peiris, Nisansa de Silva

TL;DR
This paper introduces the Forgotten Realms Wiki dataset, a large, multi-format collection of Dungeons & Dragons lore, and demonstrates its use in domain-specific natural language generation and similarity benchmarking.
Contribution
It provides the first large-scale, multi-format dataset for D&D, enabling advanced NLP tasks and domain-specific language generation in this fantasy setting.
Findings
The dataset includes over 45,200 articles in various formats.
A pairwise similarity benchmark was established using the dataset.
Domain-specific natural language generation was successfully demonstrated.
Abstract
This paper introduces the Forgotten Realms Wiki (FRW) data set and domain specific natural language generation using FRW along with related analyses. Forgotten Realms is the de-facto default setting of the popular open ended tabletop fantasy role playing game, Dungeons & Dragons. The data set was extracted from the Forgotten Realms Fandom wiki consisting of more than over 45,200 articles. The FRW data set is constituted of 11 sub-data sets in a number of formats: raw plain text, plain text annotated by article title, directed link graphs, wiki info-boxes annotated by the wiki article title, Poincar\'e embedding of first link graph, multiple Word2Vec and Doc2Vec models of the corpus. This is the first data set of this size for the Dungeons & Dragons domain. We then present a pairwise similarity comparison benchmark which utilizes similarity measures. In addition, we perform D&D domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Digital Games and Media · Topic Modeling
