A large corpus of lucid and non-lucid dream reports
Remington Mallett

TL;DR
This paper presents a curated and validated large corpus of 55,000 dream reports, including 10,000 lucid dreams, to facilitate research into dream phenomenology and lucidity.
Contribution
It provides the first extensive, labeled dataset of dream reports with validation, enabling new research into lucid dreaming.
Findings
Language patterns in lucid reports align with known lucid dream features.
The corpus includes diverse dream categories with user-provided labels.
Construct validation confirms the dataset's reliability for future studies.
Abstract
All varieties of dreaming remain a mystery. Lucid dreams in particular, or those characterized by awareness of the dream, are notoriously difficult to study. Their scarce prevalence and resistance to deliberate induction make it difficult to obtain a sizeable corpus of lucid dream reports. The consequent lack of clarity around lucid dream phenomenology has left the many purported applications of lucidity under-realized. Here, a large corpus of 55k dream reports from 5k contributors is curated, described, and validated for future research. Ten years of publicly available dream reports were scraped from an online forum where users share anonymous dream journals. Importantly, users optionally categorize their dream as lucid, non-lucid, or a nightmare, offering a user-provided labeling system that includes 10k lucid and 25k non-lucid, and 2k nightmare labels. After characterizing the corpus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
