Scalable Learning-Based Sampling Optimization for Compressive Dynamic   MRI

Thomas Sanchez; Baran G\"ozc\"u; Ruud B. van Heeswijk; Armin; Eftekhari; Efe Il{\i}cak; Tolga \c{C}ukur; Volkan Cevher

arXiv:1902.00386·eess.IV·March 17, 2020

Scalable Learning-Based Sampling Optimization for Compressive Dynamic MRI

Thomas Sanchez, Baran G\"ozc\"u, Ruud B. van Heeswijk, Armin, Eftekhari, Efe Il{\i}cak, Tolga \c{C}ukur, Volkan Cevher

PDF

1 Repo

TL;DR

This paper introduces a scalable, learning-based method for optimizing sampling masks in dynamic MRI, significantly reducing computational costs while maintaining high-quality image reconstruction from undersampled data.

Contribution

It presents a novel stochastic greedy algorithm for designing optimal sampling masks, addressing scalability issues in dynamic MRI compressed sensing.

Findings

01

Reduces computational burden by nearly 200 times.

02

Maintains reconstruction performance comparable to existing methods.

03

Provides a deterministic optimal sampling mask solution.

Abstract

Compressed sensing applied to magnetic resonance imaging (MRI) allows to reduce the scanning time by enabling images to be reconstructed from highly undersampled data. In this paper, we tackle the problem of designing a sampling mask for an arbitrary reconstruction method and a limited acquisition budget. Namely, we look for an optimal probability distribution from which a mask with a fixed cardinality is drawn. We demonstrate that this problem admits a compactly supported solution, which leads to a deterministic optimal sampling mask. We then propose a stochastic greedy algorithm that (i) provides an approximate solution to this problem, and (ii) resolves the scaling issues of [1,2]. We validate its performance on in vivo dynamic MRI with retrospective undersampling, showing that our method preserves the performance of [1,2] while reducing the computational burden by a factor close to…

Tables2

Table 1. Table 1 : Running time of the greedy algorithms for different decoders and training data sizes. The setting corresponds to n x subscript 𝑛 𝑥 n_{x} , n y subscript 𝑛 𝑦 n_{y} , n frames subscript 𝑛 frames n_{\text{frames}} , n train subscript 𝑛 train n_{\text{train}} . n procs subscript 𝑛 procs n_{\text{procs}} is the number of parallel processes used by each simulation. ∗ means that the runtime was extrapolated from a few iterations. We used k = n procs 𝑘 subscript 𝑛 procs k=n_{\text{procs}} for SG-v1 and SG-v2 and l = 3 𝑙 3 l=3 for SG-v2. The speedup column contains the measured speedup and the theoretical speedup in parentheses.

Algorithm	Setting	G-v1		SG-v1			SG-v2
Algorithm	Setting	Time	$n_{procs}$	Time	$n_{procs}$	Speedup	Time	$n_{procs}$	Speedup
KTF	152 $\times$ 152 $\times$ 17 $\times$ 3	$6 d 23 h$	$152$	$11 h 40$	$38$	$58$ $(68)$	$3 h 25$	$38$	$𝟏𝟕𝟎$ $(204)$
KTF	256 $\times$ 256 $\times$ 10 $\times$ 2	$\sim 7 d 8 h^{*}$	$256$	$12 h 20$	$64$	$57$ $(68)$	$5 h 20$	$64$	$𝟏𝟕𝟑$ $(204)$
IST	152 $\times$ 152 $\times$ 17 $\times$ 3	$3 d 11 h$	$152$	$5 h 30$	$38$	$60$ $(68)$	$1 h 37$	$38$	$𝟏𝟖𝟒$ $(204)$
ALOHA	152 $\times$ 152 $\times$ 17 $\times$ 3	$\sim 25 d 1 h^{*}$	$152$	$1 d 14 h 25$	$38$	$62$ $(68)$	$18 h 13$	$38$	$𝟏𝟑𝟑$ $(204)$

Table 2. Table 2 : Comparison of the learning-based random variable-density Gaussian sampling optimization for different settings. n pars subscript 𝑛 pars n_{\text{pars}} denotes the size of the grid used to optimize the parameters. For each set of parameters, the results were averaged on 20 20 20 masks drawn at random from the distribution considered. The n pars subscript 𝑛 pars n_{\text{pars}} include a grid made of 12 12 12 sampling rates (uniformly spread in [ 0.025 , 0.3 ] 0.025 0.3 [0.025,0.3] ), 10 10 10 different low frequency phase encodes (from 2 2 2 to 18 18 18 lines), and different widths of the Gaussian density (uniformly spread in [ 0.05 , 0.3 ] 0.05 0.3 [0.05,0.3] ) – 10 10 10 for the images of size 152 × 152 152 152 152\times 152 , 20 20 20 in the other case.

Algo.	Setting	$n_{pars}$	$n_{procs}$	Time
KTF	152 $\times$ 152 $\times$ 17 $\times$ 3	1200	38	$6$ h $30$
KTF	256 $\times$ 256 $\times$ 10 $\times$ 2^∗	2400	64	$6$ h $45$
IST	152 $\times$ 152 $\times$ 17 $\times$ 3	1200	38	$3$ h $20$
ALOHA	152 $\times$ 152 $\times$ 17 $\times$ 3	1200	38	$1$ d $8$ h

Equations21

b = P_{Ω} Ψx + w

b = P_{Ω} Ψx + w

f \in S^{p - 1} max η (f), η (f) := E_{Ω (f, n) x \sim P_{x}} [η (x, \hat{x} (Ω, x))],

f \in S^{p - 1} max η (f), η (f) := E_{Ω (f, n) x \sim P_{x}} [η (x, \hat{x} (Ω, x))],

f \in S^{p - 1} max η_{m} (f), \leavevmode \nobreak \leavevmode \nobreak \leavevmode \nobreak \leavevmode \nobreak η_{m} (f) := \frac{1}{m} i = 1 \sum m E_{Ω (f, n)} [η (Ω, x_{i})] .

f \in S^{p - 1} max η_{m} (f), \leavevmode \nobreak \leavevmode \nobreak \leavevmode \nobreak \leavevmode \nobreak η_{m} (f) := \frac{1}{m} i = 1 \sum m E_{Ω (f, n)} [η (Ω, x_{i})] .

f \in S^{p - 1} max η_{m} (f)

f \in S^{p - 1} max η_{m} (f)

\leq f \in S^{p - 1} max ∣Ω∣ = n max \frac{1}{m} \sum_{i = 1}^{m} η (x_{i}; Ω)

= ∣Ω∣ = n max \frac{1}{m} \sum_{i = 1}^{m} η (x_{i}; Ω) .

Problem \eqref eq:emp \equiv ∣Ω∣ = n max \frac{1}{m} i = 1 \sum m η (x_{i}; Ω)

Problem \eqref eq:emp \equiv ∣Ω∣ = n max \frac{1}{m} i = 1 \sum m η (x_{i}; Ω)

Problem \eqref eq:emp \equiv f \in S^{p - 1}, ∣ supp (f) ∣ = n max η_{m} (f) .

Problem \eqref eq:emp \equiv f \in S^{p - 1}, ∣ supp (f) ∣ = n max η_{m} (f) .

\equiv ∣Γ∣ = n max f \in S_{Γ} max η_{m} (f)

\equiv ∣Γ∣ = n max f \in S_{Γ} max η_{m} (f)

= ∣Γ∣ = n max

= ∣Γ∣ = n max

= ∣Γ∣ = n max

Speedup = \frac{t _{G-v1} \cdot n _{procs, G-v1}}{t _{SG-v2} \cdot n _{procs, SG-v2}}

Speedup = \frac{t _{G-v1} \cdot n _{procs, G-v1}}{t _{SG-v2} \cdot n _{procs, SG-v2}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

t-sanchez/stochasticGreedyMRI
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newcites

apndxReferences

Scalable Learning-Based Sampling Optimization for

Compressive Dynamic MRI

Abstract

Compressed sensing applied to magnetic resonance imaging (MRI) allows to reduce the scanning time by enabling images to be reconstructed from highly undersampled data. In this paper, we tackle the problem of designing a sampling mask for an arbitrary reconstruction method and a limited acquisition budget. Namely, we look for an optimal probability distribution from which a mask with a fixed cardinality is drawn. We demonstrate that this problem admits a compactly supported solution, which leads to a deterministic optimal sampling mask. We then propose a stochastic greedy algorithm that (i) provides an approximate solution to this problem, and (ii) resolves the scaling issues of [1, 2]. We validate its performance on in vivo dynamic MRI with retrospective undersampling, showing that our method preserves the performance of [1, 2] while reducing the computational burden by a factor close to 200. Our implementation is available at https://github.com/t-sanchez/stochasticGreedyMRI.

Index Terms— Magnetic resonance imaging, compressive sensing (CS), learning-based sampling.

1 Introduction

Dynamic Magnetic Resonance Imaging (dMRI) is a powerful tool in medical imaging, which allows for non-invasive monitoring of tissues over time. A main challenge to the quality of dMRI examinations is the inefficiency of data acquisition that limits temporal and spatial resolutions. In the presence of moving tissues, such as in cardiac MRI, the trade-off between spatial and temporal resolution is further complicated by the need to perform breath-holds to minimize motion artifacts [3].

In the last decade, the rise of Compressed Sensing (CS) has significantly contributed to overcoming these problems. CS allows for a successful reconstruction from undersampled measurements, provided that they are incoherent [4, 5] and that the data can be sparsely represented in some domain. In dMRI, samples are acquired in the $k$ - $t$ space (spatial frequency and time domain), and can be sparsely represented in the $x$ - $f$ domain (image and temporal Fourier transform domain). Many algorithms have exploited this framework with great success (see [6, 7, 8, 9, 10, 11, 12, 13, 14] and the references therein).

While CS theory mostly focuses on fully random measurements [15], the practical implementations have generally exploited random variable-density sampling, based on drawing random samples from a parametric distribution (typically polynomial or Gaussian) which reasonably imitates the energy distribution in the $k$ - $t$ space [16, 17]. While all these approaches allow to quickly design masks which yield a great improvement over fully random sampling, prescribed by the theory of CS, they (i) remain largely heuristic; (ii) ignore the anatomy of interest; (iii) ignore the reconstruction algorithm; (iv) require careful tuning of their various parameters, and (v) do not necessarily use a fixed number of readouts per frame.

In the present work, we show that the problem of finding an optimal mask sampling distribution which contains $n$ out of $p$ possible locations admits a solution compactly supported on $n$ elements. This demonstrates that our previously proposed framework in [1, 2], which searches for an approximately optimal sampling mask, is in fact looking for a solution to the more general problem of finding an optimal measurement distribution. In addition, we propose a scalable learning-based framework for dMRI. Our proposed stochastic greedy method preserves the performance of [1, 2] while reducing the computational burden by a factor close to $200$ .

Numerical evidence shows that our framework can successfully find sampling patterns for a broad range of decoders, from k-t FOCUSS [7] to ALOHA [13], outperforming state-of-the-art model-based sampling methods over nearly all sampling rates considered.

2 Theory

2.1 Signal Acquisition

In the compressed sensing (CS) problem [5], one desires to retrieve a signal that is known to be sparse in some basis using only a small number of linear measurements. In the case of dynamic MRI, we consider a signal $\mathbf{x}\in\mathbb{C}^{p}=\mathbb{C}^{N^{2}T}$ (i.e. a vectorized video of size $N\times N$ with $T$ frames), and the subsampled Fourier measurements are \useshortskip

[TABLE]

where $\mathbf{\Psi}\in\mathbb{C}^{p}$ is the spatial Fourier transform operator applied to the vectorized signal, $\mathbf{P}_{\Omega}:\mathbb{C}^{p}\to\mathbb{C}^{n}$ is a subsampling operator that selects the rows of $\mathbf{\Psi}$ according to the indices in the set $\Omega$ with $|\Omega|=n$ and $n\ll p$ . We refer to $\Omega$ as sampling pattern or mask. We assume the signal $\mathbf{x}$ to be sparse in the basis $\mathbf{\Phi}$ , which typically is a temporal Fourier transform across frames. Given the samples $\mathbf{b}$ , along with $\Omega$ , a reconstruction algorithm or decoder $g$ forms an estimate $\mathbf{\hat{x}}=g(\mathbf{b},\Omega)$ of $\mathbf{x}$ .

The quality of the reconstruction is then evaluated using a performance metric $\eta(\mathbf{x},\bm{\hat{\mathbf{x}}})$ , which could typically include Peak Signal-to-Noise Ratio (PSNR), the negative Mean Square Error (MSE), or the Structural Similarity Index Measure (SSIM) [18].

2.2 Sampling mask design

We model the mask designing process as finding a probability mass function (PMF) $f\in S^{p-1}$ , where $S^{p-1}:=\{f\in[0,1]^{p}:\sum_{i=1}^{p}f_{i}=1\}$ is the standard simplex in $\mathbb{R}^{p}$ . $f$ assigns to each location $i$ in the $k$ -space a probability $f_{i}$ to be acquired. The mask is then constructed by drawing without replacement from $f$ until the cardinality constraint $|\Omega|=n$ is met. The problem of finding the optimal sampling distribution is subsequently formulated as \useshortskip

[TABLE]

where the index set $\Omega\subset[p]$ is generated from $f$ and $[p]:=\{1,\ldots,p\}$ . This problem corresponds to finding the probability distribution $f$ that maximizes the expected performance metric with respect to the data $\mathcal{P}_{\mathbf{x}}$ and the masks drawn from this distribution. To ease the notation, we will use $\eta\left(\mathbf{x},\mathbf{\hat{x}}\left(\Omega,\mathbf{x}\right)\right)\equiv\eta\left(\mathbf{x};\Omega\right)$ .

In practice, we do not have access to $\mathbb{E}_{\mathcal{P}_{\mathbf{x}}}\left[\eta(\mathbf{x};\Omega)\right]$ and instead have at hand the training images $\{\mathbf{x}_{i}\}_{i=1}^{m}$ drawn independently from $\mathcal{P}_{\mathbf{x}}$ . We therefore maximize the empirical perfromance by solving \useshortskip

[TABLE]

\useshortskip

Given that Problem (3) looks for masks that are constructed by sampling $n$ times without replacement from $f$ , the following holds.

Proposition 1.

There exists a maximizer of Problem (3) that is supported on an index set of size at most $n$ .

Proof.

Let the distribution $\widehat{f}_{n}$ be a maximizer of Problem (3). We are interested in finding the support of $\widehat{f}_{n}$ . Because $\sum_{|\Omega|=n}\Pr[\Omega]=1$ , note that

[TABLE]

Let $\widehat{\Omega}_{n}$ be an index set of size $n$ that maximizes the last line above. The above holds with equality when $\Pr[\widehat{\Omega}_{n}]=1$ and $\Pr[\Omega]=0$ for $\Omega\neq\widehat{\Omega}_{n}$ and $f=\widehat{f}_{n}$ . This in turn happens when $\widehat{f}_{n}$ is supported on $\widehat{\Omega}$ . That is, there exists a maximizer of Problem (3) that is supported on an index set of size $n$ . ∎

While this observation does not indicate how to find this maximizer, it nonetheless allows us to further simplify Problem (3). More specifically, the observation that a distribution $\widehat{f}_{n}$ has a compact support of size $n$ implies the following:

Proposition 2.

\useshortskip

[TABLE]

Proof.

Proposition 1 tells us that a solution of Problem (3) is supported on a set of size at most $n$ , which implies

[TABLE]

That is, we only need to search over compactly supported distributions $f$ . Let $S_{\Gamma}$ denote the standard simplex on a support $\Gamma\subset[p]$ . It holds that

[TABLE]

To obtain the second and third equalities, one observes that all masks have a common support $\Gamma$ with $n$ elements, i.e. $f\in S_{\Gamma}$ allows only for a single mask $\Omega$ with $n$ elements, namely $\Omega=\Gamma$ . ∎

The framework of Problem (3) captures most variable-density based approaches of the literature that are defined in a data-driven fashion [19, 20, 21, 22, 23, 24, 25], and Proposition 5 shows that Problem (7), that we tackled in [1, 2] and develop here, also aims at solving the same problem as these probabilistic approaches. Note that while the present theory considered sampling points in the Fourier space, it is readily applicable to the Cartesian case, where full lines are added to the mask at once.

3 Stochastic greedy mask design

Aligned with the approach that we previously proposed in [1], we want to find an approximate solution to Problem (5) by leveraging a greedy algorithm. This is required by Problem (5) being inherently combinatorial. The previous greedy method of [1, 2] suffers from three main drawbacks: (i) it scales quadratically with the total number of lines, (ii) it scales linearly with the size of the dataset, and (iii) it does not construct mask with a fixed number of readouts by frame. While [2] partially deals with (i), our proposed stochastic greedy approach addresses all three issues, while preserving the benefits of [1]. It notably still preserves the nestedness and ordering of the acquisition, where critical locations are acquired initially, and the mask built outputs a nested structure (i.e. the mask at $30\%$ sampling rate includes all sampling locations of the mask at $20\%$ ).

Let us introduce the set $\mathcal{S}$ of all lines that can be acquired, which is a set of subsets of $\{1,\dotsc,p\}$ . A feasible Cartesian mask takes the form $\Omega=\bigcup_{j=1}^{\ell}S_{j},\quad S_{j}\in\mathcal{S}$ , i.e. it consists of a union of lines. Both the greedy method of [1] and our stochastic method are detailed in Algorithm 1 below. Our stochastic greedy method (SG-v2) addresses the three main limitations of the greedy method of [1] (G-v1). The issue (i) is solved by picking uniformly at random at each iteration a batch possible lines $\mathcal{S}_{iter}$ of size $k$ from a given frame $\mathcal{S}_{t}$ , instead of considering the full set of possible lines $\mathcal{S}$ (line 3 in Alg. 1); (ii) is addressed by considering a fixed batch of training data $\mathcal{L}$ of size $l$ instead of the whole training set of size $m$ at each iteration (line 4 in Alg. 1); (iii) is solved by iterating through the lines to be added from each frame $\mathcal{S}_{t}$ sequentially (lines 1, 3 and 10 in Alg. 1). These improvements are inspired by the refinements done to the standard greedy algorithm in the field of submodular optimization [26], and allow to move the computational complexity from $\Theta\left(mr(NT)^{2}\right)$ to $\Theta\left(lrkNT\right)$ , effectively speeding up the computation by a factor $\mathbf{\Theta(\frac{m}{l}\frac{NT}{k})}$ . Our results show that this is achieved without sacrificing any reconstruction quality.

4 Numerical Experiments

4.1 Implementation details

Reconstruction algorithms: We consider three reconstruction algorithms, namely $k$ - $t$ FOCUSS (KTF) [7], and ALOHA [13]. Their parameters were selected to maintain a good empirical performance across all sampling rates considered.

Mask selection baselines:

•

Coherence-VD [16]: We consider a random variable-density sampling mask with Gaussian density and optimize its parameters to minimize coherence.

•

LB-VD [1, 2]: Instead of minimizing the coherence as in Coherence-VD, we perform a grid search on the parameters using the training set to optimize reconstruction according to the same performance metric as our method.

Data sets: Our dynamic data were acquired in seven adult volunteers with a balanced steady-state free precession (bSSFP) pulse sequence on a whole-body Siemens 3T scanner using a 34-element matrix coil array. Several short-axis cine images were obtained during a breath-hold scan. Fully sampled Cartesian data were acquired using a $256\times 256$ grid with $25$ frames, then combined and cropped to a $152\times 152\times 17$ single coil image. The details of the parameters used are provided in the supplementary material [27]. In the experiments, we used three volumes for training and four for testing.

4.2 Comparison of greedy algorithms

We first compare the performance of G-v1 with SG-v1 and SG-v2, and show the results on Figure 1. We are specifically interested in determining the sensitivity of our algorithm to the sampling batch size $k$ and training batch size $l$ (for SG-v2, we use $l=1$ unless stated differently). We see that using a small batch size $k$ (e.g. $10$ ) yields a drop in performance, while $k=38$ even improves performance compared to G-v1, with respectively $60$ times less computation for SG-v1 and $180$ less computations for SG-v2. One should also note that using a batch of training images (SG-v2) does not reduce the performance compared to SG-v1, while largely reducing computations. Also, additional results (in the supplementary material [27]) show that using larger batches yields similar results as for $k=38$ . The fact that the performance of SG-v2 with $k=38$ outperforms G-v1 could be surprising, but originates in the lack of structure of the problem, where introducing noise in the computations through random batches of samples improves the overall performance of the method. In the sequel, we use $k=38$ and $l=1$ for SCG-v2.

4.3 Single coil results

The comparison to baselines is shown on Figures 2 and 3, where we see that the SG-v2 method yields masks that consistently improve the results compared to all variable-density methods used.

We notice in Figure 3 that comparing the reconstruction algorithms with VD methods do not allow for a faithful performance comparison of the reconstruction algorithms: the performance difference is very small between the reconstruction methods. In contrast, considering the reconstruction algorithm jointly with a sampling pattern optimized with our model-free approach makes the performance difference much more noticeable: ALOHA with its corresponding mask clearly outperforms KTF, and this conclusion could not be made by looking solely at reconstructions with VD-based masks. Note that extended results, along with multi-coil experiments, are available in our supplementary material [27].

4.4 Large scale static results

This last experiment shows the scalability of our method to very large datasets. We used the fastMRI dataset [28] consisting of knee volumes and trained the mask for reconstructing the $13$ most central slices of size $320\times 320$ , which yielded a training set containing $12649$ slices. For the sake of brevity, we only report computations performed using total variation (TV) minimization with NESTA [29]. For mask design, we used the SG-v2 method with $k=80$ and $l=20$ (2500 fewer computations compared to G-v1). The LB-VD method was trained using $80$ representative slices and optimizing the parameters with a similar computational budget as SG-v2. The result on Figure 4 shows a uniform improvement of our method over the LB-VD approach.

5 Discussion and Conclusion

We presented a scalable sampling optimization method for dMRI, which largely addresses the scalability issues of [1, 2]. Reducing the resources used by G-v1 by as much as a $200$ times was shown to have no negative impact on the quality of reconstruction achieved within our framework. Our method was demonstrated to successfully scale to very large datasets such as fastMRI [28], which the previous greedy method [1] could not achieve.

The masks obtained bring significant image quality improvements over the baselines. The results suggest that VD-based methods limit the performance of CS applied to MRI through their underlying model. They are consistently outperformed by our model-free and data-adaptive method on different in vivo datasets, across several decoders, field of views and resolutions. Our findings highlight that sampling design should not be considered in isolation from data and reconstruction algorithm, as using a mask that is not specifically optimized can considerably hinder the performance of the algorithm.

More importantly, our theoretical results show that the generic non-convex Problem (3) aiming at finding a probability mass function under a cardinality constraint from which a mask is subsequently sampled, is equivalent to the discrete Problem (7) of looking for the support of this PMF. This connection opens the door to rigorously leveraging techniques from combinatorial optimization for the problem of designing optimal, data-driven sampling masks for MRI.

Appendix A Detailed description of the datasets

Cardiac dataset. The data set was acquired in seven healthy adult volunteers with a balanced steady-state free precession (bSSFP) pulse sequence on a whole-body Siemens 3T scanner using a 34-element matrix coil array. Several short-axis cine images were acquired during a breath-hold scan. Fully sampled Cartesian data were acquired using a $256\times 256$ grid, with relevant imaging parameters including $320\text{\times}320\text{\,}\mathrm{mm}$ field of view (FoV), $6\text{\,}\mathrm{mm}$ slice thickness, $1.37\text{\times}1.37\text{\,}\mathrm{mm}$ spatial resolution, $42.38\text{\,}\mathrm{ms}$ temporal resolution, $1.63\text{/}3.26\text{\,}\mathrm{ms}$ TE/TR, [math] flip angle, $1395\text{\,}\mathrm{Hz}$ /px readout bandwidth. There were $13$ phase encodes acquired for a frame during one heartbeat, for a total of $25$ frames after the scan.

The Cartesian cardiac scans were then combined to single coil data from the initial $256\times 256\times 25\times 34$ size, using adaptive coil combination \citeapndxwalsh2000adaptive, griswold2002use, which keeps the image complex. This single coil image was then cropped to a $152\times 152\times 17$ image. This is done because a large portion of the periphery of the images are static or void, and also to enable a greater computational efficiency.

Vocal dataset. The vocal dataset that we used in the experiments F comprised $4$ vocal tract scans with a 2D HASTE sequence (T2 weighted single-shot turbo spin-echo) on a 3T Siemens Tim Trio using a 4-channel body matrix coil array. The study was approved by the local institutional review board, and informed consent was obtained from all subjects prior to imaging. Fully sampled Cartesian data were acquired using a $256\times 256$ grid, with $256\text{\times}256\text{\,}\mathrm{mm}$ field of view (FoV), $5\text{\,}\mathrm{mm}$ slice thickness, $1\text{\times}1\text{\,}\mathrm{mm}$ spatial resolution, $98\text{/}1000\text{\,}\mathrm{ms}$ TE/TR, [math] flip angle, $391\text{\,}\mathrm{Hz}$ /px readout bandwidth, $5.44\text{\,}\mathrm{ms}$ echo spacing ( $256$ turbo factor). There was a total of $10$ frames acquired, which were recombined to single coil data using adaptive coil combination as well \citeapndxwalsh2000adaptive, griswold2002use.

fastMRI. The fastMRI dataset was obtained from the NYU fastMRI initiative [28]. The anonymized dataset comprises raw k-space data from more than 1,500 fully sampled knee MRIs obtained on 3 and 1.5 Tesla magnets. The dataset includes coronal proton density-weighted images with and without fat suppression.

Appendix B Extended literature review

The most widely used approach for the design of the sampling pattern $\mathbf{\Omega}$ is random variable-density sampling, which was originally proposed by Lustig et al. [16] for static MRI and adapted to dynamic MRI by Jung et al. [17]. It offers a compromise between incoherent measurements, required by the theory of CS, and the structure that can be found in the k-space, where most of the energy is concentrated in the low frequency end of the spectrum. This classical approach draws random samples according to a parametric distribution mimicking the energy distribution of the k-space, favoring low-frequency samples. The distribution considered is typically either polynomial [16, 22] \citeapndxkim2012accelerated,tremoulheac2014dynamic, or Gaussian [7, 8, 11, 12, 13, 14]. In these setups, a slight offset is often added in order to prevent the distribution from having extremely small probabilities at high-frequencies, and a few low-frequency k-space samples are acquired at the Nyquist rate.

The variable-density based methods commonly used in dMRI perform well, but have several weaknesses, already highlighted in [1] for static MRI. They require parameters to be tuned, such as decay rate of the polynomial, the standard deviation of the Gaussian distribution or the number of central phase encodes and arbitrarily constrain the sampling patterns to a model without any theoretical justification. Moreover, it is unclear which sampling density will be most effective for a given anatomy and reconstruction rule. Also, the idea of randomizing the acquisition is in itself questionable, as in practice, one would desire to design a fixed sampling pattern that we will know to perform well for a specific anatomy across many subjects. Finally, some variable-density methods, such as Poisson Disc Sampling \citeapndxvasanawala2011practical, do not use a fixed number of readouts per frame, which complicates their hardware implementation for dynamic MRI \citeapndxahmad2015variable. Indeed, undersampling some frames more heavily than others might result in missing critical temporal information.

Recently, several articles have focused on improved design of spatiotemporal sampling patterns for dMRI, and we hereafter detail two particularly relevant methods. A recent method devised for this purpose is the variable density incoherent spatiotemporal acquisition (VISTA) \citeapndxahmad2015variable that maximizes Riesz energy on a spatiotemporal grid, and has the notable advantage of generating patterns with high levels of incoherence, and maintaining uniform sampling density across frames. Another important technique proposed by Li et al. \citeapndxli2018dynamic develops a method for Cartesian sampling exploiting the golden-ratio, with the aim to generate incoherent measurements and maintain uniform sampling density across frames111This approach is different from the commonly used golden-angle sampling used in radial sampling..

Other relevant undersampling works include, in the non-Cartesian setting, fully random radial sampling \citeapndxjung2010radial, tremoulheac2014dynamic, as well as golden-angle radial sampling, where spokes separated by the golden-angle are continuously acquired \citeapndxwinkelmann2007optimal, feng2014golden,feng2016xd. These results exploit the inherent advantage of radial over Cartesian sampling that each spoke goes through the sample of the k-space and can thus contain low-frequency as well as high-frequency information. More recent work also leverage variable-density approaches in the non-Cartesian setting \citeapndxboyer2016generation,lazarus2018variable Also, in static MRI, several methods exploiting training signals have been proposed: in \citeapndxknoll2011adapted, zhang2014energy,vellagoundar2015robust, a distribution from which random samples are drawn is constructed, and in \citeapndxseeger2010optimization,ravishankar2011adaptive,liu2012under, haldar2019oedipus, a single image is used at a time to determine the sampling mask. Very recently, deep-learning based methods have enabled active mask design paired with online reconstruction and shown very promising results \citeapndxjin2019self,zhang2019reducing,weiss2019learning. However, to the best of our knowledge, none of these methods have been extended to dynamic MRI.

Appendix C Influence of the batch size $k$ on the mask design

In this appendix, we discuss the tuning of the batch size used in SG-v1, to specifically study the effect of different batch sizes. We ran SG-v1 with different batch sizes in the same settings are in the numerical experiment of section 4.3 and report on Figure 5 the PSNR of the reconstructions for SG-v1. We only considered KTF for brevity. We see that very small batch sizes yield poor results, and the PSNR reaches the result from G-v1 with as few as 38 samples (out of $152\times 17=2584$ samples overall). Unless then the batch size is extremely small (less than $1$ to $2\%$ of all phase encoding lines at each greedy iteration), the results suggest that the masks obtained with SG-v1 or SG-v2 yield satisfactory reconstruction quality, i.e. the same quality as G-v1 or even an increase.

The Figure 6 shows the different masks obtained for the batch sizes considered, several observations can be made. First of all, as expected, taking a batch size of $1$ yields a totally random mask, and taking a batch size of $5$ yields a mask that is more centered towards low frequency than the one with $k=1$ but it still has a large variance. Then, as the batch size increases, resulting masks seem to converge to very similar designs, but those are slightly different from the ones obtained with G-v1.

Appendix D Computational costs

We report here the computational costs for the different variations of the greedy methods used in the single coil experiment 4.3 as well as the computational costs for the Appendix F. Table 1 provides the running times and empirically measured speedup for the greedy variation, and Table 2 provides the computational times required to obtain the learning-based variable density (LB-VD) parameters through an extensive grid-search. The empirical speedup is computed as

[TABLE]

The main point of these tables is to show that the computational improvement is very significant in terms of resources, and that our approach improves greatly the efficiency of the method of [1]. This ratio might differ from the predicted speedup factor of $\mathbf{\Theta(\frac{m}{l}\frac{NT}{k})}$ due to computational considerations. Table 1 shows that we have roughly a factor $1.2$ between the predicted and the measured speedup, mainly due to the communication between the multiple processes as well as I/O operations.

Appendix E Multicoil experiments

For the multicoil experiment, we used the previously described cardiac dataset but we did not crop the images. We took the first $12$ frames for all subjects, and selected $4$ coils that cover the region of interest. Each image was then normalized in order for the resulting sum-of-squares image to have at most unit intensity. When required, the coil sensitivities were self-calibrated according to the idea proposed in \citeapndxfeng2013highly, which averages the signal acquired over time in the k-space and subsequently performs adaptive coil combination \citeapndxwalsh2000adaptive,griswold2002use.

The advantage of using self-calibration is that the greedy optimization procedure can simultaneously take into account the need for accurate coil estimation as well as accurate reconstruction, thus potentially eliminating the need for a calibration scan prior to the acquisition. A more complete discussion of the accuracy of self-calibrated coil sensitivities is presented in \citeapndxfeng2013highly.

We used $k$ - $t$ SPARSE-SENSE \citeapndxotazo2010combination and ALOHA [13] for reconstruction. While the first requires coil sensitivities, the second reconstructs the images directly in k-space before combining the reconstructed data. We also introduce an additional mask designing baseline, namely golden ratio Cartesian sampling \citeapndxli2018dynamic that we will use in the sequel. We will refer to it as golden.

Appendix F Additional single-coil results with SG-v1

While the main paper focused on SG-v2, using a batch of training samples instead of the whole training set, we focus here on results with SG-v1. SG-v1 accelerated G-v1 by a factor $60$ , and we contend that due to the small dataset used in our case, using a batch of training data instead of the whole set should not affect the performance.

F.1 Comparison to baselines

The comparison to baselines is shown on Figures 2 and 3, where we see that the learning-based method yields masks which consistently improve the results compared to all variable-density methods used. Even though some variable-density techniques are able to provide good results for some sampling rates and algorithms, our learning-based technique is able to consistently provide improvement over this baseline. Compared to Coherence-VD, there is always at least $1$ dB improvement at any sampling rate, and it can be as much as $6.7$ dB at $5\%$ sampling rate for ALOHA. For golden, there is an improvement larger than $1.5$ dB prior to $15\%$ rate, and around $0.5$ dB after for all decoders. Figure 2 also clearly indicates that the benefits of our learning-based framework become more apparent towards higher sampling rates, where the performance improvement over LB-VD reaches up to $1$ dB. Towards lower sampling rates, with much fewer degrees of freedom for mask design, the greedy method and LB-VD yield similar performance as expected. As shown in Figure 3, the learning-based masks tend to conserve better the sharp contrast transition compared to the variable-density techniques.

F.2 Cross-performances of performance measures

Up to here, we used PSNR as the performance measure, and we now compare it with the results of the greedy algorithm paired with SSIM, a metric that more closely reflect perceptual similarity. For brevity, we only consider ALOHA in this section. In the case where we optimized for SSIM, we noticed that unless a low-frequency initial mask is given, the reconstruction quality would mostly stagnate. This is why we chose to start the greedy algorithm with $4$ low-frequency phase encodes at each frame in the SSIM case.

The reconstructions for PSNR and SSIM are shown on Figure 10, where we see that the learning-based masks outperform the baselines across all sampling rates except at $2.5\%$ in the SSIM case. The quality of the results is very close for both masks, but each tends to perform slightly better with the performance metric for which it was trained. The fact that the ALOHA-SSIM result at $2.5\%$ has a very low SSIM is due to the fact that we impose $4$ phase encodes across all frames, and the resulting sampling mask at $2.5\%$ is a low pass mask in this case.

A visual reconstruction is provided in Figure 10, we see that there is almost no difference in reconstruction quality, and that the masks remain very similar. Overall, we observe in this case that the performance metric selection does not have a dramatic effect on the quality of reconstruction, and our greedy framework is still able to produce masks that outperform the baselines when optimizing SSIM instead of PSNR.

F.3 Experiments with different anatomies

In these last experiments, we consider both the single coil cardiac dataset as well as the vocal imaging dataset both of size $256\times 256\times 10$ . The cardiac dataset was trained on $5$ samples and tested on $2$ , using only the first ten frames of each scan, whereas the vocal one used $2$ training samples and $2$ testing samples. In this setup, the k-space of the cardiac dataset tends to vary more from one sample to another than the vocal one, making the generalization of the mask more complicated. This issue would require more training samples, but imposing SG-v1 algorithm to start with $4$ central phase encoding lines on each frame was found to be sufficient to acquire the peaks in the k-space across the whole dataset. SGv1-Cardiac refers to the greedy algorithm using cardiac data, and SGv1-Vocal is its vocal counterpart. The algorithm used a batch of size $k=64$ at each iteration, and the results were obtained using only KTF.

The results are reported on the Figures 11 and 12, and we see that, for the both datasets, the greedy approach provides superior results against VD sampling methods across all sampling rates. It is striking that, in this setting, the SG-v1 approach outperforms even more convincingly all the baselines, and the LB-VD approach, in this case, is outperformed by more than $2$ dB by SG-v1, where it remained very competitive in the other settings. This difference is clear in the temporal fidelity of both reconstructions on Figure 12, where we see that the LB-VD approach loses sharpness and accuracy compared to SG-v1.

F.4 Comparison across anatomies

The main complication coming from applying the masks across anatomies is that the form of the k-space might vary heavily across datasets: the vocal spectrum is very sharply peaked, while the cardiac one is much broader. Comparing the cross-performances on Figures 12, we see that the and SGv1-vocal masks generalizes much better on the cardiac datasets than the other way around. This can be explained from the differences in the spectra: the cardiac one being more spread out, the cardiac mask less faithfully captures the very low frequencies of the k-space, which are absolutely crucial to a successful reconstruction on the vocal dataset, thus hindering the reconstruction quality. Also, we see that it is important for the trained mask to be paired with its anatomy to obtain the best performance.

F.5 Additional visual reconstructions for cardiac and vocal dataset

The present appendix provides further results for experiments F.3 and F.4. We show in Figures 14 and 14 reconstruction at different frames which provide clearer visual information to the quality of reconstruction compared to the temporal profiles.

For these images, the PSNR and SSIM are computed with respect to each individual frame, showing the quality of the reconstruction in a much more detailed fashion than before, where we considered each dynamic scan as a whole. Generally, we as previously observed, the mask trained for a specific anatomy will most faithfully capture the sharp contrast transitions in the dynamic regions of the images. For the vocal images, we see that sampling the first frame more heavily is important in order to avoid having a very large PSNR discrepancy, as observed for the other masks. The PSNR remains quite stable across the frames otherwise.

F.6 Noisy experiments

In order to test the robustness of our framework to noise, we artificially added bivariate circularly symmetry complex random Gaussian noise to the normalized complex images, with a standard deviation $\sigma=0.05$ for both the real and imaginary components. We then tested to see whether the greedy framework is able to adapt to the level of noise by prescribing a different sampling pattern than in the previous experiments.

We chose to use V-BM4D \citeapndxmaggioni2012video as denoiser with its default suggested mode using Wiener filtering and low-complexity profile, and provided the algorithm the standard deviation of the noise as the denoising parameter. The comparison between the fully sampled denoised images and the original ones yields an average PSNR of $24.95$ dB across the whole dataset. Due to the fact that none of the reconstruction algorithms that we used have a denoising parameter incorporated, we simply apply the V-BM4D respectively to the real and the imaginary parts of the result of the reconstruction. The results that we obtain are presented on the Figures 16 and 16.

It is interesting to notice on Figure 16 that the learning-based framework outperforms the baselines that are not learning-based by a larger margin than in the noiseless case, and this is again especially true at low sampling rates. In this case however, the difference between SG-v1 and LB-VD methods is much smaller, and this might be explained by the fact that noise corrupts the high frequency samples, and thus the masks concentrate more around low-frequencies, leaving less room for designs that largely differ.

We see a clear adaptation of the resulting learning based mask, as shown by comparing Figures 3 and 16: the masks SGv1-KTF and SGv1-ALOHA, which are trained on the noisy data, are closer to low-pass masks, due to the high-frequency details being lost to noise, and hence, no very high frequency samples are added to the mask.

Also, notice than even if the discrepancy in PSNR is only around $0.8-1$ dB between the golden ratio sampling and the optimized one, the temporal details are much more faithfully preserved by the learning-based approach, which is crucial in dynamic applications. The inadequacy of coherence-based sampling is highlighted in this case, as very little temporal information is captured in the reconstruction with both decoders. Also, for both decoders, there is a clear improvement on the preservation of the temporal profile when using learning-based masks compared to the baselines; the improvement of the SGv1-ALOHA mask of around $3$ dB also shows how well our framework is able to adapt to this noisy situation, whereas Coherence-VD yields results of unacceptable quality.

\bibliographystyleapndx

IEEEtran \bibliographyapndxbiblio

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. Gözcü, R. K. Mahabadi, Y.-H. Li, E. Ilıcak, T. Çukur, J. Scarlett, and V. Cevher, “Learning-based compressive MRI,” IEEE Transactions on Medical Imaging , 2018.
2[2] B. Gözcü, T. Sanchez, and V. Cevher, “Rethinking sampling in parallel MRI: A data-driven approach,” in 27th European Signal Processing Conference , 2019.
3[3] M. Saeed, T. A. Van, R. Krug, S. W. Hetts, and M. W. Wilson, “Cardiac MR imaging: current status and future direction,” Cardiovascular diagnosis and therapy , vol. 5, no. 4, p. 290, 2015.
4[4] E. J. Candes, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on pure and applied mathematics , vol. 59, no. 8, pp. 1207–1223, 2006.
5[5] D. L. Donoho, “Compressed sensing,” IEEE transactions on Information Theory , vol. 52, no. 4, pp. 1289–1306, 2006.
6[6] M. Lustig, J. M. Santos, D. L. Donoho, and J. M. Pauly, “ k − t 𝑘 𝑡 k-t SPARSE: High frame rate dynamic MRI exploiting spatio-temporal sparsity,” in Proc. of the 13th Annual Meeting of ISMRM, Seattle , vol. 2420, 2006.
7[7] H. Jung, K. Sung, K. S. Nayak, E. Y. Kim, and J. C. Ye, “k-t FOCUSS: A general compressed sensing framework for high resolution dynamic MRI,” Magn. Reson. Med. , vol. 61, no. 1, pp. 103–116, 2009.
8[8] R. Otazo, D. Kim, L. Axel, and D. K. Sodickson, “Combination of compressed sensing and parallel imaging for highly accelerated first-pass cardiac perfusion MRI,” Magnetic Resonance in Medicine , vol. 64, no. 3, pp. 767–776, 2010.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Scalable Learning-Based Sampling Optimization for

Abstract

1 Introduction

2 Theory

2.1 Signal Acquisition

2.2 Sampling mask design

Proposition 1**.**

Proof.

Proposition 2**.**

Proof.

3 Stochastic greedy mask design

4 Numerical Experiments

4.1 Implementation details

4.2 Comparison of greedy algorithms

4.3 Single coil results

4.4 Large scale static results

5 Discussion and Conclusion

Appendix A Detailed description of the datasets

Appendix B Extended literature review

Appendix C Influence of the batch size kkk on the mask design

Appendix D Computational costs

Appendix E Multicoil experiments

Appendix F Additional single-coil results with SG-v1

F.1 Comparison to baselines

F.2 Cross-performances of performance measures

F.3 Experiments with different anatomies

F.4 Comparison across anatomies

F.5 Additional visual reconstructions for cardiac and vocal dataset

F.6 Noisy experiments

Proposition 1.

Proposition 2.

Appendix C Influence of the batch size $k$ on the mask design