Alljoined-1.6M: A Million-Trial EEG-Image Dataset for Evaluating Affordable Brain-Computer Interfaces

Jonathan Xu; Ugo Bruzadin Nunes; Wangshu Jiang; Samuel Ryther; Jordan Pringle; Paul S. Scotti; Arnaud Delorme; Reese Kneeland

arXiv:2508.18571·q-bio.NC·August 28, 2025

Alljoined-1.6M: A Million-Trial EEG-Image Dataset for Evaluating Affordable Brain-Computer Interfaces

Jonathan Xu, Ugo Bruzadin Nunes, Wangshu Jiang, Samuel Ryther, Jordan Pringle, Paul S. Scotti, Arnaud Delorme, Reese Kneeland

PDF

1 Datasets

TL;DR

This paper introduces a large-scale, affordable EEG dataset with over 1.6 million trials, demonstrating that high-level semantic decoding and EEG-to-Image reconstruction are feasible using consumer-grade hardware, thus enabling accessible BCI research.

Contribution

The creation of a large, open-source EEG dataset using low-cost hardware, enabling scalable and cost-effective brain-computer interface research and semantic decoding.

Findings

01

Semantic information can be decoded from consumer-grade EEG data.

02

EEG-to-Image reconstruction is effective despite lower signal quality.

03

Decoding performance improves log-linearly with data volume.

Abstract

We present a new large-scale electroencephalography (EEG) dataset as part of the THINGS initiative, comprising over 1.6 million visual stimulus trials collected from 20 participants, and totaling more than twice the size of the most popular current benchmark dataset, THINGS-EEG2. Crucially, our data was recorded using a 32-channel consumer-grade wet electrode system costing ~$2.2k, around 27x cheaper than research-grade EEG systems typically used in cognitive neuroscience labs. Our work is one of the first open-source, large-scale EEG resource designed to closely reflect the quality of hardware that is practical to deploy in real-world, downstream applications of brain-computer interfaces (BCIs). We aim to explore the specific question of whether deep neural network-based BCI research and semantic decoding methods can be effectively conducted with such affordable systems, filling an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Alljoined/Alljoined-1.6M
dataset· 2.1k dl
2.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.