RadioGAN - Translations between different radio surveys with generative   adversarial networks

Nina Glaser; O Ivy Wong; Kevin Schawinski; Ce Zhang

arXiv:1906.03874·astro-ph.GA·June 11, 2019

RadioGAN - Translations between different radio surveys with generative adversarial networks

Nina Glaser, O Ivy Wong, Kevin Schawinski, Ce Zhang

PDF

TL;DR

RadioGAN employs generative adversarial networks to translate radio survey images, effectively recovering extended flux and resolving structures beyond traditional resolution limits, thus enhancing data analysis in radio astronomy.

Contribution

This work introduces RadioGAN, a novel GAN-based method for translating between different radio survey datasets, improving flux and size recovery beyond standard convolutional approaches.

Findings

01

RadioGAN recovers extended flux within 20% for nearly half of sources.

02

It achieves over a third of sources within 20% deviation in size and flux for FIRST to NVSS translation.

03

The method learns complex relations between surveys, surpassing simple convolution models.

Abstract

Radio surveys are widely used to study active galactic nuclei. Radio interferometric observations typically trade-off surface brightness sensitivity for angular resolution. Hence, observations using a wide range of baseline lengths are required to recover both bright small-scale structures and diffuse extended emission. We investigate if generative adversarial networks (GANs) can extract additional information from radio data and might ultimately recover extended flux from a survey with a high angular resolution and vice versa. We use a GAN for the image-to-image translation between two different data sets, namely the Faint Images of the Radio Sky at Twenty-Centimeters (FIRST) and the NRAO VLA Sky Survey (NVSS) radio surveys. The GAN is trained to generate the corresponding image cutout from the other survey for a given input. The results are analyzed with a variety of metrics,…

Tables4

Table 1. Table 1: Evaluation measurements of central 90 2 superscript 90 2 90^{2} -pixel cutout for different runs of FIRST to NVSS given as x ¯ ± σ x plus-or-minus ¯ 𝑥 subscript 𝜎 𝑥 \bar{x}\pm\sigma_{x} . All measurements were taken on the validation set, except for run final , where the test set was evaluated. S 20 subscript 𝑆 20 S_{20} , D 20 subscript 𝐷 20 D_{20} , and B 20 subscript 𝐵 20 B_{20} are given in percent, and PSNR is given on a logarithmic scale.

Run name	NRMSE	PSNR	SSIM	$Θ$	$l o g_{e} (\frac{S_{G A N}}{S_{O r g}})$	$l o g_{e} (\frac{D_{G A N}}{D_{O r g}})$	$S_{20}$	$D_{20}$	$B_{20}$
standard	$0.355 \pm 0.198$	$19.94 \pm 3.81$	$0.721 \pm 0.116$	$0.690 \pm 0.121$	$0.064 \pm 0.517$	$0.007 \pm 0.460$	$42.3$	$48.9$	$31.7$
opt. standard	$0.335 \pm 0.186$	$20.43 \pm 3.97$	$0.746 \pm 0.108$	$0.718 \pm 0.113$	$0.040 \pm 0.489$	$0.030 \pm 0.402$	$43.7$	$50.8$	$33.9$
focus region	$0.313 \pm 0.169$	$21.11 \pm 3.80$	$0.764 \pm 0.101$	$0.738 \pm 0.106$	$- 0.014 \pm 0.521$	$0.006 \pm 0.436$	$47.1$	$54.1$	$37.3$
second focus	$0.319 \pm 0.187$	$21.02 \pm 3.73$	$0.764 \pm 0.099$	$0.737 \pm 0.105$	$- 0.061 \pm 0.510$	$- 0.047 \pm 0.419$	$45.2$	$54.5$	$37.9$
under/over	$0.314 \pm 0.180$	$21.13 \pm 3.84$	$0.766 \pm 0.101$	$0.740 \pm 0.107$	$0.011 \pm 0.502$	$0.010 \pm 0.410$	$59.8$	$56.2$	$40.1$
D focus	$0.319 \pm 0.197$	$21.09 \pm 3.76$	$0.766 \pm 0.099$	$0.739 \pm 0.105$	$0.015 \pm 0.503$	$0.019 \pm 0.425$	$48.3$	$56.2$	$40.0$
hidden layers	$0.314 \pm 0.183$	$21.13 \pm 3.80$	$0.765 \pm 0.102$	$0.738 \pm 0.108$	$0.003 \pm 0.508$	$0.000 \pm 0.427$	$43.9$	$53.6$	$35.0$
training set	$0.310 \pm 0.179$	$21.22 \pm 3.88$	$0.767 \pm 0.101$	$0.741 \pm 0.107$	$- 0.077 \pm 0.518$	$- 0.018 \pm 0.429$	$45.4$	$54.5$	$36.5$
more epochs	$0.311 \pm 0.179$	$21.21 \pm 3.76$	$0.769 \pm 0.098$	$0.743 \pm 0.104$	$- 0.011 \pm 0.509$	$- 0.020 \pm 0.424$	$46.1$	$55.4$	$37.6$
final	$0.318 \pm 0.159$	$21.01 \pm 4.18$	$0.766 \pm 0.105$	$0.739 \pm 0.110$	$0.000 \pm 0.569$	$- 0.015 \pm 0.468$	$45.3$	$51.3$	$33.9$
gaussian conv	$0.494 \pm 0.269$	$15.67 \pm 3.67$	$0.602 \pm 0.134$	$0.555 \pm 0.143$	$- 0.392 \pm 0.808$	$0.075 \pm 0.587$	$18.8$	$27.8$	$5.2$

Table 2. Table 2: Evaluation measurements of central 70 2 superscript 70 2 70^{2} -pixel cutout for different runs of NVSS to FIRST given as x ¯ ± σ x plus-or-minus ¯ 𝑥 subscript 𝜎 𝑥 \bar{x}\pm\sigma_{x} . All measurements were taken on the validation set, except for run final , where the test set was evaluated. S 20 subscript 𝑆 20 S_{20} , D 20 subscript 𝐷 20 D_{20} , B 20 subscript 𝐵 20 B_{20} and SEF are given in percent, and PSNR is given on a logarithmic scale.

Run name	NRMSE	PSNR	SSIM	$Θ$	$l o g_{e} (\frac{S_{G A N}}{S_{O r g}})$	$l o g_{e} (\frac{D_{G A N}}{D_{O r g}})$	$S_{20}$	$D_{20}$	$B_{20}$	SEF
standard	$5.731 \pm 1.217$	$10.61 \pm 0.06$	$0.019 \pm 0.005$	$- 0.074 \pm 0.056$	–	–	–	–	–	$100.0$
opt. standard	$1.111 \pm 0.135$	$24.71 \pm 1.31$	$0.255 \pm 0.030$	$0.186 \pm 0.050$	$- 0.770 \pm 0.999$	$- 0.684 \pm 1.090$	$18.9$	$15.9$	9.0	$19.7$
focus region	$0.967 \pm 0.106$	$25.91 \pm 1.58$	$0.320 \pm 0.040$	$0.257 \pm 0.055$	$- 0.230 \pm 0.690$	$- 0.197 \pm 0.793$	$40.4$	$37.0$	$25.2$	$30.3$
second focus	$0.904 \pm 0.088$	$26.49 \pm 1.65$	$0.345 \pm 0.046$	$0.284 \pm 0.061$	$- 0.329 \pm 0.540$	$- 0.132 \pm 0.679$	$35.6$	$40.2$	$22.4$	$36.0$
under/over	$0.905 \pm 0.087$	$26.47 \pm 1.59$	$0.341 \pm 0.043$	$0.281 \pm 0.062$	$- 0.342 \pm 0.665$	$- 0.107 \pm 0.730$	$44.5$	$41.6$	$29.0$	$39.3$
D focus	$0.944 \pm 0.102$	$26.12 \pm 1.53$	$0.323 \pm 0.041$	$0.260 \pm 0.059$	$- 0.222 \pm 0.656$	$- 0.040 \pm 0.722$	$40.6$	$35.0$	$21.5$	$35.6$
hidden layers	$0.914 \pm 0.089$	$26.38 \pm 1.58$	$0.339 \pm 0.043$	$0.277 \pm 0.060$	$- 0.266 \pm 0.664$	$- 0.024 \pm 0.823$	$39.7$	$37.2$	$22.3$	$22.8$
training set	$0.904 \pm 0.098$	$26.49 \pm 1.61$	$0.346 \pm 0.046$	$0.285 \pm 0.065$	$- 0.185 \pm 0.618$	$- 0.022 \pm 0.727$	$44.4$	$42.9$	$30.4$	$39.9$
more epochs	$0.936 \pm 0.101$	$26.20 \pm 1.51$	$0.332 \pm 0.042$	$0.270 \pm 0.060$	$- 0.179 \pm 0.627$	$- 0.150 \pm 0.781$	$45.4$	$39.7$	$28.8$	$20.8$
final	$0.922 \pm 0.100$	$26.22 \pm 1.59$	$0.328 \pm 0.042$	$0.264 \pm 0.061$	$- 0.308 \pm 0.638$	$- 0.190 \pm 0.670$	$37.2$	$42.8$	$29.4$	$42.8$

Table 3. Table 3: RadioGAN parameters for FIRST to NVSS translations for different runs

Run	Epochs	Learning Rate	$L^{1} - λ_{424}$	$L^{1} - λ_{70}$	$L^{1} - λ_{30}$	u/o	$d_{70}$	additional modification
standard	$30$	$0.0002$	$100$	–	–	–	–	–
opt. standard	$30$	$0.00001$	$500$	–	–	–	–	–
focus region	$30$	$0.00001$	$500$	$500$	–	–	–	–
second focus	$30$	$0.00001$	$300$	$300$	$100$	–	–	–
under/over	$30$	$0.00001$	$300$	$300$	100	$1.2$	–	–
D focus	$30$	$0.00001$	$300$	$300$	$100$	$1.2$	0.2	several combinations tested
hidden layers	$30$	$0.00001$	$300$	$300$	$100$	$1.2$	–	two additional layers in generator
training set	$30$	$0.00001$	$300$	$300$	$100$	$1.2$	–	trained on extended set
more epochs	$50$	$0.00001$	$300$	$300$	$100$	$1.2$	–	–
final	$40$	$0.00001$	$300$	$300$	$100$	$1.2$	–	hidden layers and extended set

Table 4. Table 4: RadioGAN parameters for NVSS to FIRST translations for different runs

Run	Epochs	Learning Rate	$L^{1} - λ_{424}$	$L^{1} - λ_{70}$	$L^{1} - λ_{30}$	u/o	$d_{70}$	additional modification
standard	$30$	$0.0002$	$100$	–	–	–	–	–
opt. standard	$30$	$0.00001$	$500$	–	–	–	–	–
focus region	$30$	$0.00001$	$1000$	$1000$	–	–	–	–
second focus	$30$	$0.00001$	$1000$	$700$	$700$	–	–	–
under/over	$30$	$0.00001$	$1000$	$700$	700	$1.2$	–	–
D focus	$30$	$0.00001$	$1500$	$700$	$700$	$1.2$	0.2	several combinations tested
hidden layers	$30$	$0.00001$	$1000$	$700$	$700$	$1.2$	–	two additional layers in generator
training set	$30$	$0.00001$	$1000$	$700$	$700$	$1.2$	–	trained on extended set
more epochs	$30$	$0.00001$	$1000$	$700$	$700$	$1.2$	–	–
final	$40$	$0.00001$	$1000$	$700$	$700$	$1.2$	–	hidden layers and extended set

Equations10

f (x) = \frac{a r s inh ( Λ \cdot x )}{a r s inh ( Λ \cdot x _{ma x} )}

f (x) = \frac{a r s inh ( Λ \cdot x )}{a r s inh ( Λ \cdot x _{ma x} )}

S S I M (x, y) = \frac{( 2 μ _{x} μ _{y} + C _{1} ) ( 2 σ _{x y} + C _{2} )}{( μ _{x}^{2} + μ _{y}^{2} + C _{1} ) ( σ _{x}^{2} + σ _{y}^{2} + C _{2} )}

S S I M (x, y) = \frac{( 2 μ _{x} μ _{y} + C _{1} ) ( 2 σ _{x y} + C _{2} )}{( μ _{x}^{2} + μ _{y}^{2} + C _{1} ) ( σ _{x}^{2} + σ _{y}^{2} + C _{2} )}

Θ (G A N, O r i g .) = \frac{S S I M ( G A N , O r i g . ) - S S I M ( F I R S T , N V S S )}{1 - S S I M ( F I R S T , N V S S )}

Θ (G A N, O r i g .) = \frac{S S I M ( G A N , O r i g . ) - S S I M ( F I R S T , N V S S )}{1 - S S I M ( F I R S T , N V S S )}

G (x, y)

G (x, y)

V

D

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\extrafloats

100

RadioGAN – Translations between different radio surveys with generative adversarial networks

Nina Glaser1, O. Ivy Wong2, Kevin Schawinski1,3 and Ce Zhang4

1Institute for Particle Physics and Astrophysics, ETH Zürich, Wolfgang-Pauli-Strasse 27, CH-8093, Zürich, Switzerland

2International Centre for Radio Astronomy Research (ICRAR), The University of Western Australia, Crawley, WA 6009, Australia

3Modulos AG, Technoparkstr. 1, CH-8005, Zürich, Switzerland

4Systems Group, Department of Computer Science, ETH Zurich, Universitätstrasse 6, CH-8006, Zürich, Switzerland E-mail: [email protected]

(Accepted XXX. Received YYY; in original form ZZZ)

Abstract

Radio surveys are widely used to study active galactic nuclei. Radio interferometric observations typically trade-off surface brightness sensitivity for angular resolution. Hence, observations using a wide range of baseline lengths are required to recover both bright small-scale structures and diffuse extended emission. We investigate if generative adversarial networks (GANs) can extract additional information from radio data and might ultimately recover extended flux from a survey with a high angular resolution and vice versa. We use a GAN for the image-to-image translation between two different data sets, namely the Faint Images of the Radio Sky at Twenty-Centimeters (FIRST) and the NRAO VLA Sky Survey (NVSS) radio surveys. The GAN is trained to generate the corresponding image cutout from the other survey for a given input. The results are analyzed with a variety of metrics, including structural similarity as well as flux and size comparison of the extracted sources. RadioGAN is able to recover extended flux density within a 20% margin for almost half of the sources and learns more complex relations between sources in the two surveys than simply convolving them with a different synthesized beam. RadioGAN is also able to achieve subbeam resolution by recognizing complicated underlying structures from unresolved sources. RadioGAN generates over a third of the sources within a 20 $\%$ deviation from both original size and flux for the FIRST to NVSS translation, while for the NVSS to FIRST mapping it achieves almost $30\%$ within this range.

keywords:

methods: data analysis – techniques: image processing – radio continuum: galaxies

††pubyear: 2018††pagerange: RadioGAN – Translations between different radio surveys with generative adversarial networks–17

1 Introduction

Since the discovery of quasars (Schmidt, 1963), active galaxies have been a very active field of research. Active galactic nuclei (AGN) belong to the brightest objects in regard to persistent emission in most wavelengths. For a better understanding of AGN and quasars, high-resolution and high-sensitivity surveys are needed in several bands. Especially radio emissions are of interest in order to further investigate jet formation and powering. Thus both single-dish radio telescopes (Hachenberg et al., 1973; Nan & Li, 2013) and radio interferometric arrays (Thompson et al., 1980; Napier, 1994) are operated to gather more data. Since the angular resolution is limited by the diameter of the telescope, or the span of the array configuration, more extended instruments are needed for surveying smaller structures. Therefore, most existing radio interferometric surveys are sensitive to only a limited range of spatial scales due to the finite number of baseline lengths used for the observations. Often, different surveys with different array configurations are used to observe the same object in order to obtain measurements with different angular resolution and surface brightness sensitivity. The difference in $uv$ -coverage from different interferometric array configurations, and thus different baseline distributions, will result in the fundamental loss of information in different angular scales. This presents the true limit for the translation of aperture synthesis imaging from a survey using one array configuration to that of another array configuration.

The assumptions made in the most widely used methods for image reconstruction from interferometric observations can lead to losses in surface brightness sensitivities, astrometric accuracies, and image dynamic range. Factors such as radio frequency interference (RFI), strong emission from complex regions such as the Galactic Plane as well as associated sidelobes further compound the loss in sensitivities and accuracies.

The most common deconvolution method used in aperture synthesis imaging is the Clean algorithm (Högbom, 1974; Clark, 1980; Schwab, 1984). Clean is a computationally-simple algorithm which assumes that all sources are well-separated point sources which can each be represented by a single basis function. The limitations of Clean’s assumption of Gaussian dirty beams provides the main limitation on the achievable reconstruction of low signal-to-noise sources and as such, the dynamic range of the imaging (e.g Oberoi & Pinçon, 2003; Rau et al., 2016). As Clean cannot model diffuse emission as point sources, the final restored image include structures that are not deconvolved by Clean. Hence, the Clean residuals are not fully representative of the image quality. Furthermore, the sidelobes from neighbouring sources are assumed to have no effect on the position of other sources. This can lead to the issue of ‘clean bias’ where the peak flux densities are systematically lowered as Clean constructs artificial source components from the sidelobes of real neighbouring sources. In practice, surveys such as the Faint Images of the Radio Sky at Twenty-Centimeters (FIRST; Becker et al., 1994; White et al., 1997) reduce ‘clean bias’ by implementing shallower Clean thresholds. However, this typically leads to images with reduced surface brightness sensitivity and higher RMS noise.

While there are now several variations and extensions to the original Clean algorithm (Bhatnagar & Cornwell, 2004; Cornwell, 2008; Offringa et al., 2014; Zhang et al., 2016) that have been developed to improve upon the cleaning of both point and extended sources, these algorithms are complex, computationally expensive and do not produce consistent results when implemented in an automated fashion. More recently, new cross-disciplinary methods such as those from compressive sensing (e.g. Pratley et al., 2018) have been developed with the aim of addressing some of the limitations faced by the Clean method of image reconstruction.

Instead of developing another method for image reconstruction which accounts for all the random and systematic characteristics inherent in real interferometric observations, we investigate whether advanced deep learning methods are able to improve upon the surface brightness sensitivity and angular resolution of images reconstructed from synthesis observations. One of the main advantages of a machine learning model such as a generative adversarial network (GAN), is its ability to learn identify low signal-to-noise features in images. For example Schawinski et al. (2017) demonstrated that it is possible to train a GAN to recover image features from artificially degraded optical images with worse seeing and noise levels.

In this paper, we test whether we can gain any improvements in surface brightness sensitivities and angular resolution through the translation of 1.4 GHz images of radio sources observed by the same instrument (Very Large Array; Thompson et al., 1980) at the same narrowband frequency (1.4 GHz) but via two different array configurations which are sensitive to different distributions of angular scales. Specifically, we compare the images from the FIRST survey which uses the VLA B-array configuration to that of the NRAO VLA Sky Survey (NVSS; Condon et al., 1998) which uses the VLA D-array configuration, in the overlapping region of sky surveyed by both surveys. The maximum baseline of 1 km for the VLA D-array results in a synthesized beam of 45 ″ and sensitivity to sources up to $16.2$ ′ in size, while the 10 km baseline of the VLA B-array is sensitive to source structures that are smaller than 120 ″with a synthesized beam of 5″. FIRST has very good point source sensitivity, while NVSS has better surface brightness sensitivity. As such, it is fairly complicated to compare observations from one survey to the other. However, it may be possible for a GAN to improve the surface brightness sensitivity of the FIRST images. Schawinski et al. (2017) found that a GAN was able to recover higher-resolution image features in optical images beyond the deconvolution limit. Inspired by these results, we also test whether it is possible for a GAN to recover the higher angular resolution maps from the NVSS observations.

Applications of deep learning methods to radio astronomy has so far revolved around developing automated radio source extraction and classifications (e.g. Lukic et al., 2018; Alger et al., 2018; Wu et al., 2018). Here we do not perform any source extraction nor classification but we investigate: 1) whether we can recover more diffuse emission that is currently missing from the FIRST survey images through a GAN that is trained to better understand the low signal-to-noise emission in noisy images; and 2) whether it is possible to obtain further improvement in angular resolution from the NVSS survey images. While the superior synthesized beam of the FIRST survey provides more accurate radio source – host galaxy associations, the insensitivity to low surface brightness emission makes FIRST galaxies a less attractive sample for the studying the evolution and integrated properties of extended radio galaxies. We expect that the GAN will learn all the image artifacts that result from both the survey pipelines such as residual sidelobe patterns from calibration errors and/or dynamic range issues; striping because of bad data or RFI; as well as potentially complex emission regions such as the Galactic Plane. The impact of such image artifacts may result in the GAN producing less accurate translations. Further discussion of such uncertainties can be found in Section 4. If our proof-of-concept study is proven successful, it could mean that the application of a GAN post-image reconstruction can help mitigate the limitations introduced by the computationally-simple Clean method of image reconstruction.

We define our methods and results in Sections 2 and 3, respectively. Section 4 discusses the successes and limitations of our proof-of-concept method as well as future applications. Finally, a summary and conclusion is presented in Section 5.

2 Methods

2.1 Generative Adversarial Networks

RadioGAN was originally based on the standard architecture for conditional Generative Adversarial Networks (GAN; Goodfellow et al., 2014; Reed et al., 2016) as proposed by Isola et al. (2016). A GAN is a deep learning algorithm, which trains two neural networks simultaneously. Since GANs were introduced by Goodfellow et al. (2014), they have become a widely used tool for image-to-image translational tasks. A schematic illustration of RadioGAN can be seen in Fig. 1. The generator learns to map between the input and the given desired output during the training phase, thus effectively learns to fake output images. The adversary of the generator is the discriminator, whose sole purpose it is to estimate the probability that a given sample was generated rather than coming from the training set. The discriminator therefore learns to tell real and fake images apart. By training both simultaneously using backpropagation while creating a feedback-loop by making the discriminator part of the generator’s loss function, complex losses are used automatically. Thus a two-player minmax game is played, which in an ideal case converges to the generator recovering the training data distribution and the discriminator being equal to $\frac{1}{2}$ . For training all networks, Adam (Kingma & Ba, 2014), a stochastic gradient-based optimization algorithm, is used. This method has proven itself extremely useful and effective for both generic image-to-image translation tasks (Isola et al., 2016), and on astronomical data (Guo et al., 2017) more specifically. GANs have also shown to be a promising approach to speeding up computationally intensive problems (de Oliveira et al., 2017; Mustafa et al., 2017). GANs might soon become a standard instrument for digital image processing, since their results for noise reduction, contrast improvement, and image enhancement in general often surpass those of conventional methods. Both the usability and the performance of GANs are improved currently, which makes them an even more versatile tool with an extremely promising future.

2.2 GAN Architecture and Training

In previous work of this group a GAN was used to deconvolve SDSS images beyond the deconvolution limit (Schawinski et al., 2017) or to perform point source subtraction (Stark et al., 2018). We adopted a standard GAN architecture at first. As a result of both the data sets being substantially different and the vast difference in objectives of this project compared to the above mentioned projects, this standard architecture was not well-suited for the given task. To avoid overshooting and to eliminate artifacts, we lowered the learning rate and adjusted the weight parameter of the loss function by increasing the relative importance of the $L_{1}$ -norm. Overshooting is a phenomenon occurring if the parameters of the gradient descent algorithms are unbalanced, whereas artifacts, chequerboard-like patterns, are a result of overlapping deconvolution patches. Since the majority of the radio cutouts consist of background, we implemented some further modifications in order to make the GAN focus. Additionally we adapted some other parameters, like the size of the training set, the size of the hidden layers, or the number of epochs, in order to account for diversity of the data and the complexity of the mapping. The specific architectures that we used and their characteristics are described in Section 3, and the corresponding parameters are given in Appendix A. Training for each run was done on a single NVIDIA Titan Xp GPU and took on average 13 hours for the normal training set. Training with the extended training set took 22 hours.

2.3 Data Selection and Preprocessing

We estimate a $5\sigma$ surface brightness sensitivity at 1.4 GHz for FIRST and NVSS to be 18.7 K and 0.69 K, respectively. Our estimated surface brightness sensitivities assume the rms of FIRST and NVSS to be 0.15 mJy beam*-1* and 0.45 mJy beam*-1*, and the synthesised beam for FIRST and NVSS to be 5 arcseconds and 45 arcseconds, respectively. As NVSS has a much better surface brightness sensitivity than FIRST, we extracted the data set sources from the original NVSS catalogue. Furthermore, since from the two surveys NVSS has a lower angular resolution, the source catalogue mostly has single entries for sources that have multiple entries in FIRST. Additionally, there are fewer artifacts in NVSS than in FIRST. For a further discussion on the difference between the two surveys we refer to Condon (2015). The corresponding images of the listed sources were obtained via the NVSS ftp service. A FIRST cutout of the same sky section had to exist, without adding the requirement that the source had to be listed in the FIRST catalogue as well. Thus there are NVSS cutouts with a clearly visible source that correspond to a FIRST cutout in which there is no distinguishable source present, which is due to the difference in surface brightness sensitivity between the two surveys. The source was only used if it was not located too close to an image border in either of the cutouts, such that data for the entire $424^{2}$ pixels cutouts was available, corresponding to a minimum distance of roughly $200$ pixels between the central source and the image border in all directions. Additionally, we checked that there were no blank pixels and that the file did not display any irregularities, such as an anomalous header format.

In the field of machine learning, input images are typically pre-processed and re-scaled. As such the original input flux units were transformed into a set of arbitrary units prior to input into RadioGAN. Scaling was performed to enhance feature recognition by the GAN by rescaling the dynamic range. Thus source detection and recognition in RadioGAN is based on the relative brightness differences along the pre-processed arbitrary flux unit. Therefore the resulting pixel values were used as basic measurables during most of this project, mainly for the evaluation of the different RadioGAN architectures in Tables 1 and 2. The images displayed in this work are shown scaled for better contrast and easier comparison. For the final evaluation of RadioGAN a back conversion of both flux density and size was applied to demonstrate that those quantities are conserved. Since extreme outliers have a significant degrading effect on the performance of neural networks, a final cut was applied concerning the value of the brightest pixel of the NVSS cutout, which had to be below 1.4 arbitrary units. This upper limit was determined by statistics, since less than 0.5 % of the data was cut this way, while ensuring a smooth distribution of the values without outliers. By enforcing a smooth distribution of values of the brightest pixels, we ensured that the input data would always lie in a similar range. The GAN performs best with all input being within a small range, and ideally so between -1 and 1, or 0 and 1 (Sola & Sevilla, 1997). Thus an appropriate scaling of the data was necessary. After comparing the distribution of the brightest pixel value for 7000 cutouts, the arcsinh stretch (Eq. 1), with scaling factors of $\Lambda_{NVSS}=300$ and $\Lambda_{FIRST}=5000$ , respectively, was found to be the best fit since it resulted in a nearly Gaussian distribution, with all brightest pixel values lying between 0 and 1. $x_{max}$ refers to the maximum pixel value across all cutouts from a survey over the entire training set.

[TABLE]

Due to the different angular resolution, one set of the cutouts had to be regridded. We decided to do so because it is easier to compare images of the same size, both pixel- and object-wise, and because the GAN-architecture could be designed more flexible this way. Since the NVSS cutouts had a lower angular resolution, they were regridded using bicubic interpolation. If FIRST cutouts were regridded to a lower angular resolution, information would be lost. The final images were $424^{2}$ pixels in size, corresponding to a range of $12.7^{\prime}$ in both right ascension and declination. We prepared four data sets: a normal training set (6,570 cutouts), an extended training set (10,000 cutouts), a validation set (745 cutouts) to find the optimal architecture and parameters of the GAN, and a test set (1,000 cutouts) for the final evaluation. We distributed the sources from a given data pool randomly into the different sets.

In addition to the data sets composed of NVSS and FIRST measurements, RadioGANs performance was also evaluated on simulated observations of complex sources. The simulations were performed to further test RadioGAN on typical observations of extended radio galaxies. Observations of the well-known radio galaxy Cygnus A (Jennison & Das Gupta, 1953) have been simulated both for the VLA D- and the B-array configurations using CASA (McMullin et al., 2007) and are displayed in Appendix B. RadioGAN can then be used to translate and predict the observations from one array configuration to the other. Since Cygnus A is an extremely bright radio source, it would have been considered an outlier in terms of flux density, and therefore would have been excluded from the data set. However, for this additional proof-of-concept, this source was used because it is well-studied and reduced archival observations using the same VLA array configurations (as those for the NVSS and FIRST surveys) are publicly-available. We adapted our rescaling accordingly.

While the most useful part of this work applies to complex extended sources, point sources were not excluded so that RadioGAN would learn to recognize intrinsically compact sources as well. From the sources listed in the NVSS catalogue the majority are visible as point sources in FIRST, and correspondingly the majority of the data set cutouts contained a point source in FIRST. In the presented work the data sets were not preselected in order to reflect the actual source composition and to ensure the broad applicability of RadioGAN.

2.4 Evaluation

A reliable method for evaluating both the validation and the test set had to be developed, both for improving the GAN architecture as well as for interpreting the final results. Image-to-image translation is a young field of research, with first automatic translations done in the early 2000s (Efros & Freeman, 2001). Due to the different objectives of translation tasks and the complexity of images in general, there is no single standard evaluation method for generated images yet. Thus a combination of different measurements had to be applied for a conclusive analysis. We used very simple and widely applied full-reference quality metrics, namely the normalized root mean squared error (NRMSE) and the peak signal-to-noise ratio (PSNR), which is given on a logarithmic scale. With the data consisting largely of noise, those measurements were by themselves not yet meaningful for the performance of the GAN. Therefore we used an additional approach for image similarity assessment, and the astronomically crucial ability of the GAN to recover both the angular sizes and the flux densities of the sources was evaluated.

The Structural Similarity Index (SSIM; Wang et al., 2004) was developed for image quality assessment which would account for the underlying signal structure, as opposed to normal Minkowski error metrics. The SSIM was designed to incorporate known characteristics of the human visual system. It is composed of three terms, that represent luminance, contrast and structure comparison. In its simplest form each component is weighted equally, resulting in:

[TABLE]

where $\mu$ corresponds to the mean intensity and $\sigma$ to the standard deviation. The constants $C_{1}$ and $C_{2}$ are included to avoid instabilities and are equal to $(K_{i}\cdot L)^{2}$ , where L is the dynamic range of the pixel values and $K_{i}$ is a small number ( $K_{i}\ll 1$ ). For RadioGAN we used this form of the SSIM. SSIM values can range from -1 to 1, with -1 being the comparison of an image to its intensity inverted counterpart, 0 indicating two completely unrelated images, while a SSIM of 1 can only by obtained by comparing an image to its exact equal. Since two corresponding outputs of the data show the same object, there is already a structural similarity between the FIRST and NVSS images. Therefore we also evaluated the SSIM improvement ratio $\Theta$ , which corresponds to the difference in SSIM normalized by the total possible difference. Negative values for $\Theta$ indicate an deterioration of the SSIM. We defined it here as:

[TABLE]

To compare the recovered angular sizes and the flux densities, we need to extract the corresponding sources from both cutouts. This proved to be a rather complicated task due to the large ranges of both size and flux density values, the possible shifts in position of the sources, the extreme differences of mean flux density and background RMS, and most importantly the possibility of complete absence of a source in one of the surveys. Thus, simply extracting the source by defining a threshold was not a reliable method. A visual inspection of the sources where the threshold method failed entirely showed that most of these faint sources could still be approximated with a circular or elliptical shape. Thus we decided that a two-dimensional Gaussian fit was the best option for a reliable automated source extraction at a low computational cost.

[TABLE]

Here $x_{o}$ and $y_{o}$ indicate the position of the peak, $\sigma_{x}$ and $\sigma_{y}$ are the standard deviations in both dimensions, $A$ corresponds to the amplitude and $C$ denotes a constant offset. After the scaling of the pixel values, the median of the rather small cutouts differs from zero in most cases, and thus the constant offset was added to obtain a better fit of the sources. We used the volume $V$ of the Gaussian fit as a measurement of the flux density $S$ , and likewise $D$ was used for size comparison. For fast and reliable fitting, we had to implement sensible upper and lower limits and reasonable initial guesses for all parameters. Those were different for either NVSS or FIRST cutouts, and we optimized them such that the same source was fitted in both cutouts, even when there might be a slight shift in position and vast differences in size and amplitude.

While the source extraction algorithm always found a source in the NVSS cutouts, it failed for some of the FIRST cutouts since only noise was fitted. Those failures can be explained by the absence of some sources in FIRST, since it was not required that each of the selected sources in the NVSS catalogue has an equivalent in the FIRST catalogue. Upon visual inspection of the entire test set it was found that $32\%$ of the original FIRST cutouts do not contain a distinguishable source. Thus there was a lower threshold set for the flux density. We determined this threshold by visual inspection of the extracted sources, and set it to roughly $3$ mJy (corresponding to 350 arb. units) since below that value the brightest noise pixels were fitted. The percentage of source extraction failures (corresponding to the absence of a source in the cutout) for at least one of the sources (generated and/or original), is displayed in Table 2 for each run as Source Extraction Failure (SEF). Those cutouts were thus excluded from measurements characterising RadioGANs performance in regard to flux density and size in Table 2, since otherwise fitted noise would be compared. While source extraction failed if no source was visible, the image-to-image translation might still be successful. All cutouts were included in the figures for the flux density and size analysis. Another difficulty for source extraction is the fitting of complex sources. By fitting with a simple Gaussian, the complexity of resolved or marginally resolved sources can not be adequately described. However, those cutouts were always included in the measurements, as not to bias the analysis of RadioGANs performance towards unresolved sources.

For flux density and size comparison we took the logarithm of the flux density- and size-ratios. Those measurements are primarily meaningful for the spread of ratios, whereas the mean is only relevant to recognize systematic over- or underestimation. Additionally we measured what percentage of the generated sources would be within a $20\%$ deviation of the original flux density, size or both. Those quantities are denoted as $S_{20}$ , $D_{20}$ and $B_{20}$ in Tables 1 and 2, and are given in percent of the total evaluated data set. Another important method of evaluation was the visual inspection of the results. While not quantifiable, this proved very effective for parameter modification, since it is often easiest to detect phenomena like overshooting or the presence of artifacts by eye. For the visual assessment of the results, two different kinds of images were analyzed: 1) The original scaled cutouts, which correspond to the actual input and output of RadioGAN, which have neither been zoomed in nor colored in order to avoid any sort of biasing of the perception. 2) Zoomed in cutouts dsplaying the central source, which have been colored to enhance visualisation of the source structures. For both translations, a number of examples of both kinds of images are displayed in Section 3. Since the two-dimensional Gaussian fit did not differentiate between single, double and more complex sources, this was also done by visual evaluation. We took the evaluation measurements over a $90^{2}$ pixels region for the generated NVSS cutouts, while for the generated FIRST cutouts a region of $70^{2}$ was chosen. The reason for the choice of those specific sizes was that all the sources, including double sources and more complex structures, should be contained within the region, while making it as small as possible in order to minimize the relative weight of background. Often we tested several combinations of parameters, but only one is displayed for each run. Due to fact that GAN training is to a certain degree a process depending on randomness, the evaluations of the different runs are only snap-shots. In order to obtain more significant measurements a cross validation could be done for each run.

3 Results

3.1 FIRST to NVSS

Initially we used a standard GAN architecture, which was optimized for image deconvolution and denoising. For the task at hand this architecture proved ineffective, resulting in tremendous overshooting and the intense development of artifacts for the NVSS to FIRST translation. These runs are named standard. Thus we lowered the learning rate and increased the weight of $L^{1}$ -norm loss to avoid these phenomena. Those runs can be found under the name opt. standard, and they showed promising source generation. The evaluation of each run can be found in Table 1. Nevertheless the GAN was not yet optimally focused since the majority of the cutouts consisted of noise. Thus we added a focus region, which weighted the $L^{1}$ -norm loss in the central $70^{2}$ pixels more heavily. This specific size was chosen since the majority of the sources in NVSS were contained within. Results improved upon this modification, and can be found under focus region. We added a second focus region with a size of $30^{2}$ pixels, since most of the FIRST sources were enclosed there. This did not significantly improve the results, which are named second focus. Later some further modifications were done, namely weighting under- and overestimation differently (under/over), and adding a second discriminator concentrating on the central region (D focus). We deemed those adjustments reasonable since for the NVSS to FIRST mapping they resulted in a considerable improvement, and thus might also be beneficial for the inverse task. After finding a well-balanced set of parameters for the loss functions, we tested some additional alterations, namely increasing the number of hidden layers in the generator, enlarging the data set or extend the training by adding more epochs. Those runs can be found under the names hidden layers, training set and more epochs, respectively. After we evaluated all of the runs mentioned above, a combination of the above mentioned modifications was deemed the optimal architecture for the FIRST to NVSS translation (named final). This resulted in about $45\%$ of the extended flux density being recovered within a $20\%$ deviation of the original, while over half the sources deviated less than $20\%$ from the original size. The test set cutouts yielded an average SSIM of $0.766$ , and the PSNR average was $21.01$ . Several examples can be seen in Fig. 2, both successful translations and interesting failures. Some source extractions are displayed in Fig. 3. The generated cutout of Cygnus A is displayed in Appendix B, and additional examples of translations of complex sources can be found in the Appendix C. Over a third of the generated sources are within a $20\%$ margin in scatter for both sizes and flux densities. It should be noted that there is no simple relation between flux density and size of a source, so that generating a source within a $20\%$ size margin does not automatically result in the source being within a $20\%$ flux density margin. RadioGAN is able to recover the flux density to size distribution fairly well, as can be seen in Appendix D.

RadioGAN is able to recover the overall flux density distribution very well, and does also generate outliers, but those deviate more significantly from the original values than average sources do (see Fig. 6, left). The peak density distributions for the flux density to flux density plots match very well, as can be seen on the right in Fig. 6. For quantitative analysis of the flux density distribution we performed a Kolmogorov-Smirnov test (Kolmogorov, 1933; Smirnov, 1948), which resulted in a KS statistic of $D=0.048$ with $p=0.694$ . Thus flux densities of the generated NVSS sources appear to come from the same parent distribution as the original NVSS flux densities, implying that the surface brightness sensitivity of a more sensitive survey is recovered by RadioGAN. To check whether RadioGAN simply convolved the sources from the FIRST cutouts with a wider kernel, we compared the flux density from the generated NVSS cutout to the flux density of the original source in FIRST, as well as their respective sizes. Since the clusters in Fig. 6 are not densely concentrated on a linear slope, but rather spread over a wide range, the GAN does not simply convolve the sources with a larger kernel. Thus is seems to be able to learn a more complex relation between the resolved structures and their extended flux.

In order to compare RadioGAN to its non-machine learning equivalent if that corresponded to simply convolving the sources with a Gaussian, the test set cutouts of FIRST were convolved with a Gaussian kernel, and the resulting cutouts were analysed. Their evaluation results are displayed in Table 1 under gaussian conv, and the obtained flux densities are compared to RadioGAN in Fig. 7. The Gaussian convolution performs significantly worse than RadioGAN in all metrics that were used. As shown in the left panel of Fig. 7, the GAN-generated flux values are largely consistent with a one-to-one relationship to the true flux values measured by NVSS, while a simple convolution of the FIRST observations to the NVSS beamsize (right panel of Fig. 7) show significantly greater scatter in flux values relative to the original values measured by NVSS. In particular, the simple convolution of fainter sources (where $log$ $S_{NVSS}<3$ mJy) results in highly inaccurate flux values because the convolution includes the additional positive and negative flux values that originate from the FIRST imaging sidelobes that were insufficiently cleaned. This results in only $5.2\%$ of the sources being within a $20\%$ flux density margin, compared to $33.9\%$ of the sources obtained by RadioGAN. To illustrate the non-linearity of the translation further, the size to size distribution of the entire test set is shown in Appendix E. It can be seen that the resulting size of the source extraction for many sources in FIRST is very small, which often corresponds to cutouts where the source is not easily distinguishable. Those cutouts correspond to a large range of source sizes in NVSS, so that no simple relation is recognizable. Thus a linear translation, such as Gaussian convolution, cannot result in accurate source sizes by definition of the problem. These non-linear relations make a deep learning model such as RadioGAN an ideal tool for the translation between radio surveys.

3.2 NVSS to FIRST

While the RadioGAN was able to do the FIRST to NVSS translation satisfactory after a light modification of the standard GAN architecture, this was not the case for the NVSS to FIRST translation. This result is unsurprising due to the angular resolution limitations set by the maximum baseline of VLA D-array configuration. In addition to the diffraction limit of the D-array observations, the low surface brightness sensitivity of the FIRST observations were also insufficient for revealing extended diffuse structures. Nevertheless, we explored the introduction of different weights and focus regions to fully rule out our ability to repeat the results from Schawinski et al. (2017), where image features finer than the deconvolutional limit were recovered.

We discovered by comparing the difference in mean brightness and rms between the generated cutouts and the original cutouts that RadioGAN tends to lower the mean and to reduce the standard deviation. The generated images often have a smaller dynamical range of the pixel values, resulting in decreased contrast, and most sources are underestimated in flux density. While this phenomenon could be controlled by increasing the weight of the discriminator in the loss function, which is more favourable towards extreme values than the $L^{1}$ -norm, we chose another approach as to eliminate artifacts that would have been generated due to the dominance of the discriminator. Thus we varied the weighting of under- and overestimation, resulting in a increased loss for underestimation by a factor of 1.2. Due to the increased weighting of the $L^{1}$ -norm in the central region of the image, more complex structures were often blurred out with diffuse or non-continuous edges, as they were sometimes fragmented. Therefore we applied a second discriminator only on the central $30^{2}$ pixels. We tested several parameter combinations, as well as different proportions of discriminator- to generator-training. Even though the second discriminator eliminated some negative phenomena, there was no overall improvement for the combinations that were tried since other effects worsened. Thus the second discriminator was not used further, since training an additional neural network also increased the computational costs. If perfectly equilibrated, a second discriminator could potentially improve results, since it could be focused such that it might counteract certain unwanted phenomena.

Furthermore we tested if adding layers to the generator or training on a larger set would improve the results. Both of the runs hidden layer and training set resulted in good overall performance. Additionally we tested if training more epochs would improve results, which was not the case. Thus for the run final an architecture with additional hidden layers was chosen and RadioGAN was trained on the extended training set for 30 epochs. The evaluation measurements can be found in Table 2. The NVSS to FIRST translation resulted in an average PSNR of $26.22$ , and over $40\%$ of the sources were generated within a $20\%$ margin of the original size. Almost $30\%$ percent of the generated sources were within a $20\%$ deviation of both the original size and flux density. One should keep in mind especially for complex structures that a distance of about eight pixels corresponds to a single pixel in the original NVSS data! Thus RadioGAN can extract remarkable information. In some of the examples, as displayed in Fig. 4, it can be seen that double sources can be recognized even if in NVSS they visually appear to be a single, circular source. RadioGAN recognized Cygnus A as a resolved, extended double source, as can be seen in Appendix B. Additional examples of results for the NVSS to FIRST translation of complex sources can be found in Appendix C. $68\%$ of the translations succeeded in mapping single, double, or non-distinguishable sources, or complex structures to the same category. RadioGAN succeeded in recognizing NVSS cutouts with a clearly visible source that would translate to a cutout without a distinguishable source in FIRST in $95\%$ of the cases. Thus RadioGAN seems to recognize the difference in surface brightness sensitivity of the two surveys. RadioGAN rarely ( $1.5\%$ of all cases) generates sources that have no counterpart in the original FIRST cutout. Overall the NVSS to FIRST translation was satisfactory, even if the generated cutouts are visually distinguishable from the originals.

4 Discussion

4.1 Success and Failure

After analysis of cutouts that were within the top or the bottom $1\%$ in regard to a certain evaluation measurement, we found several commonalities. Examples for the top and bottom $1\%$ for several categories can be found in Appendix F. For the FIRST to NVSS translations RadioGAN performed well, with the majority of the generated sources within a $20\%$ margin both for size and flux density, on circular sources that were easily distinguishable from the background. This was the case for original NVSS cutouts with very dark background and FIRST cutouts with bright, easily visible sources. RadioGAN failed in majority of cases where there was both a very bright (as in higher pixel value average) background in NVSS and an almost visually indistinguishable source in FIRST. The smaller the difference between the darkest (background) and the brightest (source) pixel, the worse the GAN performed on average. Therefore one could assume a direct correlation between RadioGANs performance and the contrast of the test cutout. More complex structures, such as double sources or extended visible jets, did not pose that much of a problem for the FIRST to NVSS translation. There were several double sources contained in cutouts among the top 1 $\%$ in several categories. Several sources in one cutout only seemed to pose a problem if a extremely faint source was close to a fairly bright one. Also the translation from point-like sources in FIRST to complex structures in NVSS failed more often than for circular NVSS counterparts. RadioGAN tended to generate background significantly different from the original cutout if the original background was either very bright or very dark compared to average cutouts. For the NVSS to FIRST translation the majority of the generated sources were within a $20\%$ margin in regard to flux density for single circular sources that were significantly brighter than the brightest interference background. RadioGAN failed on some double sources and on complex structures, and often avoided generating a source at all for faint sources in NVSS.

In general, it can be concluded that RadioGAN failed on sources where there was not enough information contained in the cutouts to learn about a more complex underlying structure. Those failures are often encountered when the FIRST and the NVSS cutouts differ substantially, so that one would not visually recognize that they show the same sources. Additionally the absence of certain complex structures in the training set explain the GANs failure due to insufficient training. An improved training set would need to be extended in order to avoid under-representing complex sources. Problems in the surveys, such as sidelobe patterns or dynamic range limitations, might also have a substantial effect on RadioGANs performance. Some failures can be explained by factors arising from those problems. For example complex emission regions, such as the Galactic Plane, might cause a very different background than that of a region without any extended emission that is very bright. The simple metrics used for RadioGANs evaluation will not be as effective in these regions. For instance, both the NVSS and the FIRST integrated flux densities, as found in the survey catalogues, include correction factors to account for ’clean bias’. No such correction has been performed for the evaluation of RadioGAN. Besides, a purely physical explanation could quite likely be the reason for some failures. It is possible that there is simply not enough information contained within the cutouts for some translations, e.g. that the signal-to-noise ratio is too low in one of the surveys. Nevertheless the GAN is able to learn complex relations between the data of the two surveys, and is able to recover the flux density of NVSS in almost half of the translations.

4.2 Radio Data

RadioGAN performed well on the FIRST to NVSS translation with over half of the generated sources within a $20\%$ margin of the original size, and achieved some surprisingly good and sometimes reasonable results for the NVSS to FIRST translation. Our finding that RadioGAN is able to better translate from FIRST to NVSS than from NVSS to FIRST is consistent with our understanding of the limitations of radio synthesis imaging. While the VLA B-array configuration used by the FIRST survey can detect emission on angular scales larger than 45 arcseconds, our ability to translate from FIRST to NVSS (in a manner that is superior to a simple convolution), suggests that RadioGAN is able to use the low signal-to-noise information on the larger angular scales that is available for longer-baseline observations. On the other hand, the availability of long-baseline information is non-existent in short-baseline observations and this is reflected by our results in attempting to translate from NVSS to FIRST. RadioGAN did not simply convolve the FIRST cutouts with a different, about five times bigger kernel (PSF for optical images), but learned some underlying structures. This resulted in an overall performance of over a third of the sources being generated within a $20\%$ deviation range for both size and flux density for the best GAN architecture that was tested in the scope of this project. It is possible that this performance might still be improved further by additional modifications. The generated cutouts are often not visually recognizable as fake images. The NVSS to FIRST translation did not yield a comparable success rate, which was expected due to the angular resolution of FIRST being almost ten times better. Nevertheless RadioGAN was able to generate the counterparts within a $20\%$ margin for the majority of circular sources with average size and flux density, whose brightness was sufficient such that contrast with the background was good. It also learns some more complex underlying structures, which is remarkable considering that such structures are completely hidden in the NVSS cutouts.

There are several modifications that could be done in order to further improve the performance of RadioGAN. In regard to the technical set up and the architecture, we see the following three alterations as the most promising: Firstly, instead of just using 1-band images with the total intensity, 3-band cutouts with the three Stokes planes could be used. Secondly, different interpolation algorithms and other scaling methods could be tested. Thirdly, the loss functions and overall architecture of the GAN can be modified to an arbitrary degree of complexity. Those modifications might result in an improvement, since by performing a more exhaustive search of the overall architecture- and parameter-space, a better combination could be found. Another possibility would be to alter the composition of the data sets in order to put more emphasis on specific translations. Since $54\%$ of the original FIRST cutouts contain a point source, both RadioGANs training and its performance analysis is naturally biased towards point sources. By modifying the data set composition it is possible to counteract this natural bias and to specialize RadioGAN for a certain task. For example increasing the number of cutouts containing complex sources for training could improve RadioGANs performance on resolved sources, while a similar composition of the test set would increase the relative weight of those translations in the evaluation. Additionally a more sophisticated approach could be used for source extraction.

5 Summary and Conclusion

In this paper, we study the possibilities and difficulties of attaining images with a larger range of angular scales from radio synthesis observations through exploring image-to-image translations from two radio surveys using two different VLA array configurations.

•

There is more information contained in radio data than visually noticeable.

•

GANs are able to learn complex underlying relations between sources in different radio surveys, as they can achieve subbeam resolution and recover large structures that are resolved out.

•

RadioGAN is able to do a satisfying image-to-image translation for most cutouts, generating over half of the sources within a 20% margin of the original source size for the FIRST to NVSS translation, whereas the inverse translation is slightly less successful.

•

RadioGAN could be used as a tool, both to recover more of the total emission and to generate a cutout of a source with a different angular resolution, as well as an instrument to find particularly interesting sources.

Overall RadioGAN succeeds in performing both translations with satisfactory results, and thus a range of further possibilities emerged. The future of radio surveys could be substantially influenced by the possibilities of machine learning. The results for the FIRST to NVSS translation are promising, so that after more extensive training and with an ideal method its success rate would be satisfactory. Thus it might be a possibility to only conduct comprehensive surveys at a high angular resolution and obtain estimates for extended flux density by using GANs. After a thorough analysis of failures from a large test set, candidates for bad RadioGAN performance might be identified. For those sources a second survey could be done in order to still have accurate information of their large scale structures. For the mapping form NVSS to FIRST, the results were extremely interesting, while not as promising as the FIRST to NVSS translation. Thus the reasonable option would be to use RadioGAN for extended flux estimates as mentioned above with a FIRST to NVSS translation, while also performing the inverse translation in order to see what can be learned from given data, since this can give some interesting insights into the contained information. RadioGAN’s ability to translate from a higher angular resolution to one that has greater surface brightness sensitivity is very promising for future SKA-mid pathfinder surveys such as MeerKAT and ASKAP, which can attain angular resolution of a few arcseconds. Similarly, spectral line surveys such as those of atomic Hydrogen (HI) in and around nearby galaxies can also benefit from RadioGAN’s ability to extract diffuse extended emission. For example, recent early science HI observations from the Australian Square Kilometre Array Pathfinder still find that higher angular resolution observations are typically missing some of the emission detected by previous single-dish HI observations (Reynolds et al., 2018).

The code for RadioGAN is available at space.ml/proj/RadioGAN.html111http://space.ml/proj/RadioGAN.html

Acknowledgements

KS acknowledges support from Swiss National Science Foundation Grants PP00P2_138979 and PP00P2_166159 and the ETH Zurich Department of Physics. CZ and the DS3Lab gratefully acknowledge the support from the Swiss National Science Foundation NRP 75 407540_167266, IBM Zurich, Mercedes-Benz Research & Development North America, Oracle Labs, Swisscom, Zurich Insurance, Chinese Scholarship Council, the Department of Computer Science at ETH Zurich, and the cloud computation resources from Microsoft Azure for Research award program. The International Centre for Radio Astronomy Research (ICRAR) is a joint venture between Curtin University and The University of Western Australia with support and funding from the State Government of Western Australia.

Appendix A Run Parameters

Appendix B Observation Simulations and Results for Cygnus A

Appendix C Examples of translations for complex sources

Appendix D Size to flux density distribution of NVSS

Appendix E Size to size distribution

Appendix F Examples of Success and Failure

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Alger et al. (2018) Alger M. J., et al., 2018, MNRAS , 478, 5547 · doi ↗
2Becker et al. (1994) Becker R. H., White R. L., Helfand D. J., 1994, in Crabtree D. R., Hanisch R. J., Barnes J., eds, Astronomical Society of the Pacific Conference Series Vol. 61, Astronomical Data Analysis Software and Systems III. p. 165
3Bhatnagar & Cornwell (2004) Bhatnagar S., Cornwell T. J., 2004, A&A , 426, 747 · doi ↗
4Clark (1980) Clark B. G., 1980, A&A, 89, 377
5Condon (2015) Condon J., 2015, in The Many Facets of Extragalactic Radio Surveys: Towards New Scientific Challenges. p. 4
6Condon et al. (1998) Condon J. J., Cotton W. D., Greisen E. W., Yin Q. F., Perley R. A., Taylor G. B., Broderick J. J., 1998, AJ , 115, 1693 · doi ↗
7Cornwell (2008) Cornwell T. J., 2008, IEEE Journal of Selected Topics in Signal Processing , 2, 793 · doi ↗
8Efros & Freeman (2001) Efros A. A., Freeman W. T., 2001, in SIGGRAPH.