MultiDiffNet: A Multi-Objective Diffusion Framework for Generalizable Brain Decoding
Mengchun Zhang, Kateryna Shapovalenko, Yucheng Shao, Eddie Guo, Parusha Pradhan

TL;DR
MultiDiffNet introduces a diffusion-based framework that learns a compact latent space for EEG decoding, significantly improving generalization across subjects and sessions without synthetic data augmentation, and provides a comprehensive benchmark suite.
Contribution
It proposes a novel diffusion-based approach for EEG decoding that enhances cross-subject generalization and introduces a unified benchmark and evaluation protocol.
Findings
Achieves state-of-the-art generalization in EEG decoding tasks.
Provides a new benchmark suite and evaluation protocol for EEG research.
Develops a statistical reporting framework for low-trial EEG settings.
Abstract
Neural decoding from electroencephalography (EEG) remains fundamentally limited by poor generalization to unseen subjects, driven by high inter-subject variability and the lack of large-scale datasets to model it effectively. Existing methods often rely on synthetic subject generation or simplistic data augmentation, but these strategies fail to scale or generalize reliably. We introduce \textit{MultiDiffNet}, a diffusion-based framework that bypasses generative augmentation entirely by learning a compact latent space optimized for multiple objectives. We decode directly from this space and achieve state-of-the-art generalization across various neural decoding tasks using subject and session disjoint evaluation. We also curate and release a unified benchmark suite spanning four EEG decoding tasks of increasing complexity (SSVEP, Motor Imagery, P300, and Imagined Speech) and an…
Peer Reviews
Decision·Submitted to ICLR 2026
The proposed approach avoids common pitfalls like artifact introduction in GAN-based or diffusion-based synthesis. The Temporal Masked Mixup is a nice extension of standard mixup, preserving temporal structure in EEG signals. By standardizing datasets and enforcing subject/session-disjoint splits, it addresses inconsistencies in prior works. The trend-level statistical framework mitigates p-value limitations in high-variance settings, promoting evidence-based claims. Ablations covering decode
No runtime or parameter count comparisons; given EEGNet's lightweight design, how does MultiDiffNet's added DDPM/decoder components affect efficiency for real-time BCIs? Some hyperparameters such as embedding dimensions, attention pool specifics are underspecified in the main text. This could hinder replication. Mixup integration points are mentioned but results aren't fully tabulated, which point works best per task? P300 results are mixed with MultiDiffNet underperforms baselines on unseen a
The idea of using reconstructed EEGs (from diffusion, decoder-based reconstruction) for augmentation is fine, particularly for EEGs, as one could expect the reconstructed EEGs are less subject to subject-specific noise and eventually improve generalization across subjects. However, this idea is not new, but the proposal to jointly train the diffusor & the decoder for EEG generation/reconstruction jointly with the encoder in multi-objective is.
{\bf Method:} The approach that trains jointly generation/reconstructor and the encoder seems overkilled. I understand the motivation to unify them in multi-objective training, but I do not see the benefits in terms of performance and generalization (there are no experiments to showcase that). At the same time, I am also concerned about the stability of training such a model. In the early phase, when the diffusion and reconstructor are not yet in good state, the reconstructed EEGs are bad, woul
They show decent results on their chosen datasets and beat a lot of SOTA results by a decent margin, especially on the unseen categories.
- The paper is very procedural: It is about applying some already very well known techniques via multiple losses and techniques. There is no hypothesis that they are trying to prove or disapprove; It’s a benchmarking paper at best. On the theoretical side they have combined a lot of well-known losses. But no questions have been answered like why this way is the best way to solve the given problem. They claim that other generative synthetic augmentations are not scalable, but no computational-tim
Videos
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Neural dynamics and brain function · Emotion and Mood Recognition
