SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

Md Hasan; Nyvenn Castro; Daiqi Liu; Lukas Mulzer; Jana Hutter; Jonghye Woo; Moritz Zaiss; Andreas Maier; Paula A. Perez-Toro

arXiv:2605.18221·cs.SD·May 19, 2026

SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

Md Hasan, Nyvenn Castro, Daiqi Liu, Lukas Mulzer, Jana Hutter, Jonghye Woo, Moritz Zaiss, Andreas Maier, Paula A. Perez-Toro

PDF

1 Repo

TL;DR

SIREM is a novel speech-informed MRI reconstruction framework that leverages synchronized speech as a prior to improve real-time vocal-tract imaging speed and quality.

Contribution

It introduces a multimodal reconstruction method combining audio-driven prediction and MRI data, with a learnable sampling profile for enhanced speed and accuracy.

Findings

01

SIREM outperforms standard baselines in reconstruction quality.

02

It enables faster MRI reconstruction while maintaining plausible anatomy.

03

The method establishes a new benchmark for speech-informed rtMRI.

Abstract

Real-time magnetic resonance imaging (rtMRI) of speech production enables non-invasive visualization of dynamic vocal-tract motion and is valuable for speech science and clinical assessment. However, rtMRI is fundamentally constrained by trade-offs among spatial resolution, temporal resolution, and acquisition speed, often leading to undersampled k-space measurements and degraded reconstructions. We propose SIREM, a speech-informed MRI reconstruction framework that uses synchronized speech as a cross-modal prior. The central idea is that vocal-tract configurations during speech are correlated with the produced acoustics, making part of the image content predictable from audio. SIREM models each frame as a fusion of an audio-driven component and an MRI-driven component through a spatial weighting map. The audio branch predicts articulator-related structure from speech, while the MRI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mdhasanai/SIREM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.