A large corpus of lucid and non-lucid dream reports

Remington Mallett

arXiv:2603.26992·cs.CL·March 31, 2026

A large corpus of lucid and non-lucid dream reports

Remington Mallett

PDF

TL;DR

This paper presents a curated and validated large corpus of 55,000 dream reports, including 10,000 lucid dreams, to facilitate research into dream phenomenology and lucidity.

Contribution

It provides the first extensive, labeled dataset of dream reports with validation, enabling new research into lucid dreaming.

Findings

01

Language patterns in lucid reports align with known lucid dream features.

02

The corpus includes diverse dream categories with user-provided labels.

03

Construct validation confirms the dataset's reliability for future studies.

Abstract

All varieties of dreaming remain a mystery. Lucid dreams in particular, or those characterized by awareness of the dream, are notoriously difficult to study. Their scarce prevalence and resistance to deliberate induction make it difficult to obtain a sizeable corpus of lucid dream reports. The consequent lack of clarity around lucid dream phenomenology has left the many purported applications of lucidity under-realized. Here, a large corpus of 55k dream reports from 5k contributors is curated, described, and validated for future research. Ten years of publicly available dream reports were scraped from an online forum where users share anonymous dream journals. Importantly, users optionally categorize their dream as lucid, non-lucid, or a nightmare, offering a user-provided labeling system that includes 10k lucid and 25k non-lucid, and 2k nightmare labels. After characterizing the corpus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.