MindDiffuser: Controlled Image Reconstruction from Human Brain Activity   with Semantic and Structural Diffusion

Yizhuo Lu; Changde Du; Qiongyi zhou; Dianpeng Wang; Huiguang He

arXiv:2308.04249·cs.CV·August 9, 2023

MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion

Yizhuo Lu, Changde Du, Qiongyi zhou, Dianpeng Wang, Huiguang He

PDF

Open Access 1 Repo

TL;DR

MindDiffuser is a two-stage model that reconstructs images from brain activity by aligning semantic and structural features, surpassing previous methods and demonstrating neurobiological plausibility.

Contribution

The paper introduces a novel two-stage approach combining VQ-VAE, CLIP, and Stable Diffusion for controlled image reconstruction from fMRI data.

Findings

01

Outperforms state-of-the-art on NSD dataset

02

Achieves cohesive semantic and structural alignment

03

Demonstrates neurobiological plausibility of the model

Abstract

Reconstructing visual stimuli from brain recordings has been a meaningful and challenging task. Especially, the achievement of precise and controllable image reconstruction bears great significance in propelling the progress and utilization of brain-computer interfaces. Despite the advancements in complex image reconstruction techniques, the challenge persists in achieving a cohesive alignment of both semantic (concepts and objects) and structure (position, orientation, and size) with the image stimuli. To address the aforementioned issue, we propose a two-stage image reconstruction model called MindDiffuser. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion, which yields a preliminary image that contains semantic information. In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reedonepeck/minddiffuser
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural dynamics and brain function · Visual Attention and Saliency Detection · Face Recognition and Perception

MethodsContrastive Language-Image Pre-training · Diffusion · ALIGN · VQ-VAE