TL;DR
This paper introduces a novel synthesizing-decomposition approach using GANs for single-channel signal separation and deconvolution, effectively handling non-stationary noise without prior knowledge of mixing filters.
Contribution
The paper presents a new GAN-based method that jointly estimates sources and mixing filters, improving separation and deconvolution performance over traditional methods.
Findings
Achieves 13.2 dB PSNR in source separation and deconvolution, outperforming NMF baseline.
Outperforms CNN baseline in image inpainting with 18.9 dB PSNR.
Effectively handles non-stationary noise unseen during training.
Abstract
Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available. Both individual sources and mixing filters need to be estimated. In addition, a mixture may contain non-stationary noise which is unseen in the training set. We propose a synthesizing-decomposition (S-D) approach to solve the single-channel separation and deconvolution problem. In synthesizing, a generative model for sources is built using a generative adversarial network (GAN). In decomposition, both mixing filters and sources are optimized to minimize the reconstruction error of the mixture. The proposed S-D approach achieves a peak-to-noise-ratio (PSNR) of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming a baseline convolutional neural…
| Noise | Mixing filters | |
| Denoising | Gaussian | , is a constant |
| Inpainting, Completion | Unknown | , is a constant |
| Deconvolution | - | , is a tensor |
| Separation | - | , are constants |
| Separation + deconvolution | - | , are tensors |
| denoising | inpainting | completion | |
|---|---|---|---|
| CNN | 26.0 dB | 15.3 dB | 12.2 dB |
| NMF | 17.4 dB | 13.4 dB | 12.9 dB |
| convolutive NMF | 18.3 dB | 13.4 dB | 13.0 dB |
| S-D with 1 init. | 23.1 dB | 15.2 dB | 13.6 dB |
| S-D with 8 init. | 25.1 dB | 18.2 dB | 15.4 dB |
| S-D with 32 init. | 25.1 dB | 18.9 dB | 15.4 dB |
| deconv. | sep. | sep. + deconv. | |
| NMF | 15.3 dB | 9.4 dB | 8.7 dB |
| convolutive NMF | 18.3 dB | 14.2 dB | 10.1 dB |
| S-D with 1 init. | 17.3 dB | 13.7 dB | 9.3 dB |
| S-D with 8 init. | 21.9 dB | 16.8 dB | 11.5 dB |
| S-D with 32 init. | 23.2 dB | 18.5 dB | 13.2 dB |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Single-Channel Signal Separation and Deconvolution
with Generative Adversarial Networks
Qiuqiang Kong1
Yong Xu2
Wenwu Wang1
Philip J.B. Jackson1 &Mark D. Plumbley1 1University of Surrey, Guildford, UK
2Tencent AI lab, Bellevue, USA {q.kong, w.wang, p.jackson, m.plumbley}@surrey.ac.uk, [email protected]
Abstract
Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available. Both individual sources and mixing filters need to be estimated. In addition, a mixture may contain non-stationary noise which is unseen in the training set. We propose a synthesizing-decomposition (S-D) approach to solve the single-channel separation and deconvolution problem. In synthesizing, a generative model for sources is built using a generative adversarial network (GAN). In decomposition, both mixing filters and sources are optimized to minimize the reconstruction error of the mixture. The proposed S-D approach achieves a peak-to-noise-ratio (PSNR) of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming a baseline convolutional neural network PSNR of 15.3 dB and 12.2 dB, respectively and achieves a PSNR of 13.2 dB in source separation together with deconvolution, outperforming a convolutive non-negative matrix factorization (NMF) baseline of 10.1 dB.
1 Introduction
Single-Channel signal separation and deconvolution aims to separate and deconvolve sources from a single-channel mixture. One challenging aspect of single-channel signal separation and deconvolution is that only a single-channel mixture is available, so this problem is underdetermined. Second, there is no prior knowledge of the mixing filters. Both individual sources and mixing filters are unknown and need to be estimated. Third, there is no prior knowledge on the noise, which can be non-stationary and has not been seen in the training data. These difficulties lead to single-channel signal separation and deconvolution being a very challenging problem. Single-channel signal separation and deconvolution has many applications in image, speech and audio denoising Xie et al. (2012), inpainting Yeh et al. (2016), deconvolution and separation Cichocki et al. (2009); Mijovic et al. (2010). For example, an audio sensor usually receives signals from multiple sources convolved with channel distortion.
Much previous work focuses on source separation Cichocki et al. (2009); Grais et al. (2014) or deconvolution Levin et al. (2009); Campisi and Egiazarian (2017) independently, but not together. We categorize previous source separation and deconvolution methods into decomposition based approaches and regression based approaches. Decomposition methods usually learn a set of bases for sources and use these bases to decompose a mixture. Decomposing methods including non-negative matrix factorization (NMF) Lee and Seung (1999); Cichocki et al. (2009); Kitamura et al. (2013) assumes that a source can be represented by linear combination of a set of bases. NMF has been used in source representation and separation Cichocki et al. (2006); Kitamura et al. (2013). In contrast to the decomposition based approaches, regression based approaches learn a mapping from a mixture to an individual source. Such mappings can be modeled by neural networks, for example, fully connected neural networks Grais et al. (2014) and convolutional neural networks (CNNs) Jain and Seung (2009); Zhang et al. (2017). In Xie et al. (2012), a stacked denoising auto-encoder (DAE) is proposed to recover sources from a mixture. CNNs are used for source deconvolution in Xu et al. (2014).
However, many decomposition methods such as NMF and ICA are shallow layer models, which are typically a linear combination of bases. These shallow layer models do not have enough capacity to represent a broad range of sources compared with neural networks Jain and Seung (2009). On the other hand, regression based approaches such as deep neural networks are able to model complicated mappings but require both mixture and target sources for training. Regression based methods may not generalize well if the mixing filter and noise in the testing data have different distribution from the training data, which will result in poor separation results when the mixing filter and noise are unseen in the training data Yosinski et al. (2014). Recently generative adversarial networks (GANs) have been proposed for solving the source separation problem Fan et al. (2018); Subakan and Smaragdis (2017); Stoller et al. (2017). So far these methods assume that the mixing filters in the single-channel signal separation problem are known.
This paper proposed a novel synthesizing-decomposition (S-D) approach to solve the single-channel source separation and deconvolution problem. Compared to the conventional regression approaches, the S-D approach applies generative adversarial network (GANs) to solve this problem in a generative way. The S-D approach can estimate both the sources and convolutive mixing filters, while conventional regression methods do not estimate convolutive mixing filters. In addition, we formulate the single-channel signal separation and deconvolution problem as a Bayesian maximum a posteriori (MAP) estimation which is a constrained non-convex optimization problem. In the S-D approach, a generative model is built for sources using a generative adversarial network (GAN). In decomposition, both sources and mixing filters can be obtained by minimizing the reconstruction error of a mixture. To tackle the non-convex optimization problem, repeating the decomposition with different initializations can significantly increase the underdetermined single-channel signal separation and deconvolution performance. We carry out the underdetermined single-channel signal separation and deconvolution experiments on MNIST dataset as a starting research to show the effectiveness of the proposed S-D approach with GANs.
This paper is organized as follows: Section 2 formulates the underdetermined single-channel signal separation and deconvolution problem. Section 3 proposes the synthesising-decomposition (S-D) approach for this problem. Section 4 shows experimental results. Section 5 concludes and forecasts future work.
2 Single-Channel Signal Separation and Deconvolution
In underdetermined single-channel signal separation and deconvolution, a single-channel mixture is composed of individual sources convolved with unknown filters followed by unknown additional noise . The space can be a Euclidean space where and denote the number and the dimension of sources, respectively:
[TABLE]
The symbol represents the convolution operation:
[TABLE]
For the simple case of source separation without deconvolution, in (2) simplifies to where is the Dirac delta function. General single-channel signal separation and deconvolution problem concerns both separating and deconvolving individual sources from a single-channel mixture while the mixing filters and the noise signal are unknown in (1). In the following paper, we simplify the notation of to and , respectively.
In the regression based approaches Jain and Seung (2009); Grais et al. (2014), a mapping from a mixture to a source signal is modeled by deep neural networks and learned to separate the -th source: . In separation, separated sources are obtained by forwarding a mixture to the model: . However there are several problems associated with the regression based approaches as follows:
Problem 1.
In regression based supervised learning, the training data and testing data should have the same distribution, otherwise the trained model will be biased Yosinski et al. (2014). However, in single-channel signal separation and deconvolution, no prior knowledge of test noise is available. The model trained with training noise may not generalize well to sources with unseen non-stationary noise.
Problem 2.
In single-channel signal separation and deconvolution, both the sources and mixing filters are unknown and need to be estimated.
Problem 3.
Previous regression and decomposition based approaches do not constrain the distribution of the separated sources to be the same as the distribution of real sources . Ideally, the separated sources should be regularized in the area where has larger value.
Decomposition approaches such as NMF can be trained on individual sources instead of on a mixture so that Problem 1 can be mitigated. Recently, GANs Fan et al. (2018); Subakan and Smaragdis (2017); Stoller et al. (2017) have been applied to source separation to solve Problem 3 to constrain the separated sources to be laid in natural source space. However, those methods are based on the assumption that the mixing filters are constants so that they are solving only separation but not deconvolution problem as shown in (1).
3 Proposed Synthesising-Decomposition (S-D) Approach
3.1 Maximum a Posteriori (MAP) Estimation
In this section, we first formulate the single-channel signal separation and deconvolution problem in (1) as a Bayesian parameter estimation problem. We denote as the set of parameters to be estimated, including sources and mixing filters. The estimated can be obtained by maximum a posteriori (MAP) estimation:
[TABLE]
The first term in (3) is a likelihood function. The reconstructed signal can be written as . Assuming is a Gaussian process, the likelihood of observed signal given estimated signal can be written as:
[TABLE]
where is the probability density of a Gaussian distribution. The second term in (3) is the prior probability of . Assuming the sources and filters are independent of each other, we can write as:
[TABLE]
We assume to have a compact support . Substituting equations (4) and (5) to equation (3) the estimation of sources and filters can be obtained by solving the following optimization problem:
[TABLE]
3.2 Optimization with S-D Approach
To optimize (6) is difficult because of the constraint of . The source prior is unknown, so that can not be written in a closed form. Our solution is to convert (6) to an unconstrained optimization problem. In the proposed S-D approach, we first build a generative model for with a GAN Goodfellow et al. (2014); Subakan and Smaragdis (2017). A GAN consists of a generator and a discriminator . The generator is a mapping from any distribution such as a Gaussian distribution to a real distribution of sources. We call a seed distribution and sample as seeds. The generator is trained to generate samples to fool the discriminator . The discriminator is trained to discriminate fake sources from real sources. In other words, the generator and the discriminator play the following two-player minimax game with value function Goodfellow et al. (2014):
[TABLE]
where is the real data probability density. The training of the GAN is shown in Algorithm 1. The generator and discriminator are trained iteratively. If both and have enough capacity, then the generated source distribution will converge to Subakan and Smaragdis (2017). Once GAN is successfully trained, there is for all . To solve the optimization problem in (6), we substitute and optimize over instead of so that the constraint is eliminated. Now the variables to be optimized are and the mixing filters . In addition, GAN does not predict the probability density of so the optimization of equation (6) is intractable. To solve this problem, we approximate with:
[TABLE]
Equation (8) assumes the probability density outside is zero. It is not required to know the value of as it is eliminated when optimizing (6):
[TABLE]
We assume the coefficients in to be Gaussian . Taking the logarithm of (9) the optimization can be written as:
[TABLE]
where is a regularization term for (10).
3.3 Optimization
To solve (10), we apply a gradient based iterative approach. We denote where and need to be optimized. First we randomly initialize , then the gradients of are calculated by:
[TABLE]
The parameters are optimized using Algorithm 2. Because is a non-linear mapping, so (10) is a non-convex function over . The gradient based methods might reach a local minimum depending on the initialization of seeds. To mitigate this problem we repeat Algorithm 2 for times and choose the one with smallest reconstruction error.
4 Experiments
In this section, we apply the proposed S-D method to solve underdetermined image single-channel signal separation and deconvolution problem. We carry out experiments on MNIST 10-digit dataset LeCun et al. (1998) as a starting research for this challenging problem and show the effectiveness of the proposed S-D method. With different types of unknown mixing filters and unknown interference noise , the problem of (1) can be categorized as image denoising, inpainting, completion, deconvolution and separation, as shown in Table 1. The symbol ‘-’ represents any type of noise. Previous works usually focus on one of these problems such as denoising Jain and Seung (2009), inpainting Xie et al. (2012), deconvolution Xu et al. (2014) or separation Subakan and Smaragdis (2017). In this paper we solve these problem together with the proposed S-D method. The PyTorch implementation of this paper is released111https://github.com/qiuqiangkong/gan_separation_deconvolution.
4.1 Model Configuration
In the proposed S-D approach, we model the synthesising procedure with a deep convolutive generative adversarial network (DCGAN) Radford et al. (2015), which can stabilize the training of a GAN and can generate high quality images as shown in Radford et al. (2015). A DCGAN consists of a generator and a discriminator . The input to consists of a seed sampled from a Gaussian distribution . The Gaussian distribution has a dimension of 100 following Radford et al. (2015). The generator has 4 transpose convolutional layers with number of feature maps of 512, 256, 128 and 1, respectively. Following Radford et al. (2015), batch normalization Ioffe and Szegedy (2015) and ReLU non-linearity are applied after each transpose convolutional layer. The output of is an image which has the same size as the images in the training data. The discriminator takes a fake or a real image as input. The discriminator consists of 4 convolutional layers, with a sigmoid output representing the probability that the input to is from real data instead of generated data. Following Radford et al. (2015), we use the Adam Kingma and Ba (2015) optimizer with a learning rate of 0.0002, a of 0.5 and a of 0.999 to train the generator. In decomposition, we freeze the trained generator . We approximate with a Gaussian distribution which works well in our experiment. We set to 0.001 to regularize the mixing filters to be searched. The filters and are randomly initialized and optimized with Adam optimizer with a learning rate of 0.01, a of 0.9 and a of 0.999 (Algorithm 2).
For comparison with regression based approaches, we apply a CNN Xie et al. (2012) which consists 4 layers with batch normalization Ioffe and Szegedy (2015) and ReLU non-linearity. The number of layers and parameters are set to be the same as the discriminator in the DCGAN. The CNN is trained to regress from individual source with noise to individual source . For comparison with decomposition based approaches, we train a dictionary for each of the 10 digits using NMF Cichocki et al. (2009) with Euclidean distance. Each dictionary consists of 20 bases which performs well in our experiment. In decomposition, the trained dictionaries are concatenated to form a dictionary of 200 bases which is then used to decompose the mixtures.
4.2 Evaluation
Following Xie et al. (2012); Xu et al. (2014); Jain and Seung (2009), we use peak signal to noise ratio (PSNR) to evaluate single-channel signal separation and deconvolution quality. A higher PSNR indicates a better reconstruction quality. PSNR is defined as:
[TABLE]
where is the maximum value of a noise-free image. MSE represents mean squared error between two images and with size of :
[TABLE]
4.3 Denoising, Inpainting and Completion
Denoising, inpaining and completion are special case of single-channel signal separation and deconvolution problem where is an unknown constant and is unknown noise such as Gaussian noise, non-stationary noise or corruption of an image. The first and second rows of Fig. 1 show the clean and noisy images. The third to the fifth rows show the denoised images with CNN, NMF and the proposed S-D approach. In the first column, testing noise and training noise have the same distribution so CNN performs well. However CNN based denoising methods do not generalize well to unseen noise such as non-stationary noise or image corruption shown in the second and third columns in Fig. 1. NMF performs better than CNN under unseen noise but sometimes produces unnatural separation result, which is due to Problem 3 we stated in Section 2. S-D approach has a good performance in all of image denoising, inpainting and completion. Table 2 shows PSNR of CNN, NMF, convolutive NMF and S-D approaches. S-D approach achieves a PSNR of 25.1 dB in image denoising which is comparable to CNN. NMF and convolutive NMF achieve similar PSNR of 17.4 dB and 18.3 dB, respectively. In image inpainting, S-D achieves a PSNR of 18.9 dB, outperforming NMF and CNN methods of 13.4 dB and 15.3 dB, respectively. This result shows source separation with S-D generalize well to unseen noise than NMF and CNN. In image completion, S-D approach achieves a PSNR of 15.4 dB, outperforming convolutive CNN of 12.2 dB and convolutive NMF of 12.9 dB respectively. Table 2 also shows the decomposition in S-D approach with respect to the number of initializations. With 8 or 32 initializations the performance is 2 dB better than with only 1 initialization. This may result from the fact that the optimization problem in (10) is non-convex. Algorithm 2 is a gradient based method which may lead to the solution being in a local minimum. Repeating Algorithm 2 several times with different initializations and choosing the solution with least reconstruction error shows better performance.
4.4 Separation and Deconvolution
We evaluate single-channel signal separation and deconvolution with the mixing filters as unknown tensors, which is a very challenging task. In this case both of the mixing tensors and individual sources need to be estimated. Fig. 2 shows a mixture obtained by convolving clean sources with mixing filters followed by summation. In our experiment we set and each mixing filter has a size of . In actual application scenarios the size of mixing filter depends on the task. Fig. 2 shows NMF based separation often leads to unnatural images. The S-D based approach can separate images with high quality and both the sources and mixing filters can be estimated. Fig. 2 shows both estimated sources and mixing filters are learned correctly compared with the ground truth sources and mixing filters. The first column of Table 3 shows the results of image deconvolution without separation where K=1 and is an unknown tensor. S-D achieves a PSNR of 23.2 dB and performs better than NMF and the convolutive NMF of 15.3 dB and 18.3 dB, respectively. The second column of Table 3 shows the results of image separation where are unknown constants and . S-D achieves a PSNR of 18.5 dB and performs better than NMF and convolutive NMF of of 9.4 dB and 14.2 dB, respectively. The third column of Table 3 shows both of source separation and deconvolution where are unknown tensors and . S-D achieves a PSNR of 13.2 dB and outperforms NMF and convolutive NMF of 8.7 dB and 10.1 dB, respectively. S-D with 32 initializations has higher PSNR than 8 initializations and than 1 initialization, which shows the effectiveness of repeating Algorithm 2 several times to solve the non-convex optimization problem in (10).
5 Conclusion
In this paper, we propose a synthesis-decomposition (S-D) approach to solve single-channel signal separation and deconvolution problem. In synthesizing, a generative model for source signals is trained using a generative adversarial network (GAN). In decomposition, both sources and filters are optimized to minimize the reconstruction error. Instead of optimizing sources directly, we optimize over the seeds of a GAN. The proposed S-D approach achieves a PSNR of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming the regression approach CNN and decomposition approach NMF. The S-D approach achieves a PSNR of 13.2 dB in image source separation with deconvolution, outperforming NMF of 8.7 dB. Repeating the decomposition in S-D several times can significantly improve PSNR. In future, we will explore the S-D approach to more source separation and deconvolution problems.
Acknowledgements
This research was supported by EPSRC grant EP/N014111/1 “Making Sense of Sounds” and a Research Scholarship from the China Scholarship Council (CSC) No. 201406150082.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Campisi and Egiazarian [2017] Patrizio Campisi and Karen Egiazarian. Blind image deconvolution: theory and applications . CRC press, 2017.
- 2Cichocki et al. [2006] A. Cichocki, R. Zdunek, and S. Amari. New algorithms for non-negative matrix factorization in applications to blind source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2006.
- 3Cichocki et al. [2009] A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation . John Wiley & Sons, 2009.
- 4Fan et al. [2018] Z. Fan, Y. Lai, and J. Jang. SVSGAN: Singing voice separation via generative adversarial network. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2018.
- 5Goodfellow et al. [2014] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS) , pages 2672–2680, 2014.
- 6Grais et al. [2014] E. M. Grais, M. Sen, and H. Erdogan. Deep neural networks for single channel source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 3734–3738, 2014.
- 7Ioffe and Szegedy [2015] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML) , 2015.
- 8Jain and Seung [2009] V. Jain and S. Seung. Natural image denoising with convolutional networks. In Advances in Neural Information Processing Systems (NIPS) , pages 769–776, 2009.
