TurbuStat: Turbulence Statistics in Python

Eric W. Koch; Erik W. Rosolowsky; Ryan D. Boyden; Blakesley Burkhart,; Adam Ginsburg; Jason L. Loeppky; Stella S.R. Offner

arXiv:1904.10484·astro-ph.IM·June 12, 2019

TurbuStat: Turbulence Statistics in Python

Eric W. Koch, Erik W. Rosolowsky, Ryan D. Boyden, Blakesley Burkhart,, Adam Ginsburg, Jason L. Loeppky, Stella S.R. Offner

PDF

1 Repo

TL;DR

TurbuStat is an open-source Python package that offers a comprehensive suite of tools for analyzing turbulence in spectral-line data cubes, including multiple statistical methods, simulation capabilities, and data comparison features.

Contribution

The paper introduces TurbuStat v1.0, a versatile Python package with 14 turbulence analysis methods, simulation tools, and data comparison features for spectral-line data cubes.

Findings

01

Provides 14 turbulence analysis methods

02

Includes simulation and synthetic observation tools

03

Supports multi-core processing for efficiency

Abstract

We present TurbuStat (v1.0): a Python package for computing turbulence statistics in spectral-line data cubes. TurbuStat includes implementations of fourteen methods for recovering turbulent properties from observational data. Additional features of the software include: distance metrics for comparing two data sets; a segmented linear model for fitting lines with a break-point; a two-dimensional elliptical power-law model; multi-core fast-fourier-transform support; a suite for producing simulated observations of fractional Brownian Motion fields, including two-dimensional images and optically-thin HI data cubes; and functions for creating realistic world coordinate system information for synthetic observations. This paper summarizes the TurbuStat package and provides representative examples using several different methods. TurbuStat is an open-source package and we welcome community…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Astroua/TurbuStat
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

TurbuStat: Turbulence Statistics in Python

Eric W. Koch

Erik W. Rosolowsky

University of Alberta, Department of Physics 4-183 CCIS, Edmonton AB T6G 2E1, Canada

Ryan D. Boyden

Department of Astronomy and Steward Observatory, University of Arizona, 933 North Cherry Avenue, Tucson, AZ 85721, USA

Blakesley Burkhart

Center for Computational Astrophysics, Flatiron Institute, 162 Fifth Avenue, New York, NY 10010, USA

Department of Physics and Astronomy, Rutgers, The State University of New Jersey, 136 Frelinghuysen Rd, Piscataway, NJ 08854, USA

Adam Ginsburg

Jansky fellow of the National Radio Astronomy Observatory, 1003 Lopezville Road, Socorro, NM 87801, USA

Jason L. Loeppky

Department of Physics, University of British Columbia, Okanagan Campus, 3333 University Way, Kelowna, BC V1V 1V7, Canada

Stella S.R. Offner

Department of Astronomy, The University of Texas at Austin, 2515 Speedway, Stop C1400, Austin, TX 78712, USA

Abstract

We present TurbuStat (v1.0): a python package for computing turbulence statistics in spectral-line data cubes. TurbuStat includes implementations of fourteen methods for recovering turbulent properties from observational data. Additional features of the software include: distance metrics for comparing two data sets; a segmented linear model for fitting lines with a break-point; a two-dimensional elliptical power-law model; multi-core fast-fourier-transform support; a suite for producing simulated observations of fractional Brownian Motion fields, including two-dimensional images and optically-thin HI data cubes; and functions for creating realistic world coordinate system information for synthetic observations. This paper summarizes the TurbuStat package and provides representative examples using several different methods. TurbuStat is an open-source package and we welcome community feedback and contributions.

turbulence — methods: statistical — methods: data analysis

††software: astropy (Astropy Collaboration et al., 2018), matplotlib (Hunter, 2007), seaborn (Waskom et al., 2017), numpy (Oliphant, 2006–), scipy (Jones et al., 2001–), scikit-image (van der Walt et al., 2014), scikit-learn (Pedregosa et al., 2011), statsmodels (Seabold & Perktold, 2010), astrodendro (dendrograms.readthedocs.io), spectral-cube (spectral-cube.readthedocs.io), radio-beam (radio-beam.readthedocs.io), emcee (Foreman-Mackey et al., 2013), corner (Foreman-Mackey, 2016), pyFFTW (Gomersall, 2016) and FFTW (Frigo & Johnson, 2005)

1 Introduction

Turbulence is ubiquitous throughout the interstellar medium (see reviews by Elmegreen & Scalo, 2004; Lazarian, 2009). Observations of different ISM phases demonstrate that similar turbulent properties are found over a wide range of scales (e.g., Armstrong et al., 1995; Chepurnov & Lazarian, 2010). Connecting these observations with theoretical and numerical predictions is critical for understanding how turbulence affects the structure and motion of the ISM. This connection provides important inputs for many astrophysical processes; for example, turbulence plays a central role in modern star formation theories (e.g., Krumholz et al., 2009; Ostriker et al., 2010; Federrath & Klessen, 2012; Burkhart, 2018).

Determining turbulent properties from observations is challenging because of the limited information available and the complexity of the ISM. Two-dimensional images, in particular column density or extinction maps, are a projection of a three-dimensional turbulent field. Spectral-line observations provide additional constraints from the line-of-sight velocity, though features resolved in velocity may not be spatially-distinct (Burkhart et al., 2013b; Beaumont et al., 2013) and are affected by opacity and line excitation (Burkhart et al., 2013a; Correia et al., 2016). This means that observations miss up to four of the full six-dimensional phase-space111Three spatial and three velocity dimensions. that describe the ISM structure and its motion. The inherent complexity of the ISM adds to the difficulty in interpreting observations. Turbulence in the ISM may be driven by multiple energy injection sources on different scales (McKee & Ostriker, 2007; Krumholz et al., 2014; Chepurnov et al., 2015; Krumholz et al., 2018) and is affected by variations in magnetic field strength and orientation (Goldreich & Sridhar, 1995; Cho & Lazarian, 2003; Burkhart et al., 2009; Meyer et al., 2014; Burkhart et al., 2015b; Hull et al., 2017). There are additional physical effects at work, including phase transitions that affect thermodynamic properties and gravitational collapse within molecular clouds (Bialy et al., 2017; Hill et al., 2018).

Significant effort over the last $\sim 40$ years has sought to connect predicted turbulent properties from theory (e.g., Goldreich & Sridhar, 1995) with (magneto-)hydrodynamic simulations (e.g., Cho & Lazarian, 2003; Kritsuk et al., 2007; Kowal et al., 2007; Federrath et al., 2008; Collins et al., 2012) and observations (e.g., Burkhart et al., 2010). To make these connections, an array of methods have been proposed in the literature to recover turbulent properties from observational data. These methods utilize either two-dimensional images or three-dimensional spectral-line data cubes222Observations of polarization provide additional information (e.g. Gaensler et al., 2011; Iacobelli et al., 2014; Herron et al., 2018), but TurbuStat does not currently handle polarization data.. The latter has been of particular interest for recovering properties of the turbulent density and velocity fields (e.g., Lazarian & Pogosyan, 2000).

We have developed TurbuStat, a publicly-available Python package that implements fourteen observational diagnostics of ISM turbulence described in the literature. TurbuStat provides a common framework for running and comparing turbulence diagnostics, including comparisons between simulations and observations (e.g., Boyden et al., 2016; Koch et al., 2017; Boyden et al., 2018; Haworth et al., 2018). The use of some techniques has been limited by the lack of a publicly-available implementation. Furthermore, many studies focus on using one or a small number of techniques. This has resulted in a limited understanding of the regimes where particular methods are best-suited and the limits where inherent assumptions in a method break down. The breadth of techniques in TurbuStat provides the opportunity to explore these issues.

In this paper, we present an overview of TurbuStat’s first major release (v1.0), including a description of the methods implemented in TurbuStat (§2), an overview of the package (§3), and a demonstration of TurbuStat’s capabilities with representative examples (§4). The Appendices highlight the choice of normalization for the wavelet transform (Gill & Henriksen, 1990, Appendix A), a comparison of our Delta-variance implementation to the original IDL code (Ossenkopf et al., 2008a, Appendix B), and a series of memory and timing tests for the methods in TurbuStat (Appendix C). TurbuStat is open-source and includes extensive documentation and tutorials on the use of the methods (turbustat.readthedocs.io). We encourage feedback from the community and welcome contributions.

2 Methods

TurbuStat (v1.0 Koch et al., 2019) has implementations of 14 literature methods333We note that this number differs from Boyden et al. (2016, 2018) since (i) the Tsallis statistic was not used and (ii) multiple outputs of dendrograms and statistical moments were counted as individual methods. that recover properties related to turbulence from observational data. We briefly describe the methods and relevant literature here (also see descriptions in Boyden et al., 2016; Koch et al., 2017; Boyden et al., 2018), and note that the package documentation contains thorough explanations and code examples.

2.1 Structure Analysis

The spatial structure of the ISM is hierarchical. Statistics that characterize the structural properties of an image or spectral-line data cube are one way to describe the hierarchical structure. TurbuStat has two methods which provide a non-parametric description of hierarchical structure.

Genus

Genus statistics are a measure of topology. The value of the genus statistic is the difference between the number of isolated regions above and below a threshold. A genus curve is produced by varying the threshold over a range of values. The first use of the genus statistic on column density maps was introduced Lazarian et al. (2002) and later expanded on by Kowal et al. (2007), Chepurnov et al. (2008) and Burkhart et al. (2012). The implementation in TurbuStat closely follows the approach from Chepurnov et al. (2008).

Dendrograms

Dendrograms are a common method for exploring the hierarchical structure of data. Their use in molecular cloud studies was proposed by Rosolowsky et al. (2008) and Goodman et al. (2009), where pixels in an image or data cube are combined into hierarchical clusters based on their brightness (also see Houlahan & Scalo, 1990, 1992). Burkhart et al. (2013) explored two statistics based on the dendrogram structure: (1) the relation between the number of structures in the dendrogram as a function of the branch height, and (2) the histogram of peak intensity in each structure of the dendrogram. Dendrograms can also be combined with other statistics, such as 1D PDFs (Chen et al., 2018). TurbuStat implements both of these statistics and utilizes astrodendro444dendrograms.readthedocs.io to compute the dendrograms.

2.2 Properties of Turbulence

Despite the complexity of astrophysical turbulence, several statistics for observational data have theoretically-motivated properties. TurbuStat implements a number of these methods, particularly those related to power-spectra.

Spatial Power-spectrum (SPS)

The spatial power-spectrum is a widely-used method for finding the turbulent field index in the ISM. The power-spectrum index from the column density or velocity centroid map is related to the underlying density or velocity field of the ISM, respectively (e.g., Lazarian & Pogosyan, 2000; Esquivel & Lazarian, 2005). Most studies reduce the two-dimensional power-spectrum of the image into one dimension with azimuthal-averaging then fit a power-law to find the index (e.g., Crovisier & Dickey, 1983; Scalo, 1984; Stanimirović & Lazarian, 2001; Padoan et al., 2006; Burkhart et al., 2013a; Pingel et al., 2018). Additional information can be retained when modelling the full two-dimensional power-spectrum, including preferred directions of structure in the image and anisotropy (e.g., Martin et al., 2015; Kalberla & Kerp, 2016) that are predicted to relate to the magnetic field structure Burkhart et al. (2014); Kandel et al. (2017); González-Casanova & Lazarian (2017). TurbuStat can be used for both types of studies and can model breaks in the power-spectrum and the effect of a telescope beam (see §3.6).

Modified Velocity Centroids (MVC)

MVC is an adaptation of the spatial power-spectrum introduced by Lazarian & Esquivel (2003) that accounts for velocity-density correlations, which alter the power-spectrum of a velocity centroid map. The centroid power-spectrum is corrected by subtracting the column density power-spectrum multiplied by the average velocity-dispersion in the data. Esquivel & Lazarian (2005) explore the limits for when this correction is required. The MVC implementation in TurbuStat requires an input of the velocity centroid, integrated intensity555Or column density., and line width666From the second moment. maps. The formulation in Lazarian & Esquivel (2003) requires unnormalized velocity centroids (i.e., without dividing by the integrated intensity). Since most observational products do not utilize this form of centroid, our implementation converts the centroid map to an unnormalized form. TurbuStat’s implementation of MVC has the same features described for the spatial power-spectrum.

Velocity Channel Analysis (VCA)

Lazarian & Pogosyan (2000) show that, by considering the power spectrum of a spectral-line cube integrated over spectral frequencies, the index of the spatial power-spectrum will be altered by velocity fluctuations for sufficiently small spectral channel widths (see also Lazarian & Pogosyan, 2004). By increasing the spectral channel width, the power-spectrum index approaches the power-spectrum of the column density, which is set only by density fluctuations (e.g., Stanimirović & Lazarian, 2001; Muller et al., 2004). Recent work by Kandel et al. (2016) expands these theoretical predictions to include anisotropy in two-dimensional power-spectra. The VCA implementation in TurbuStat has the same functionality as the spatial power-spectrum and includes the ability to alter the spectral channel width of a given cube (see §4.1.2).

Velocity Coordinate Spectrum (VCS)

The VCS is a complementary technique to VCA that is first mentioned in Lazarian & Pogosyan (2000) and later expanded on in Lazarian & Pogosyan (2006, also see (, )). While VCA integrates over the spectral dimension, VCS is the integration over the spatial dimensions, which yields a 1D spectral power spectrum. The VCS implementation in TurbuStat fits a broken linear model to the VCS (§3.6), mimicing the asymptotic high- and low-resolution solutions presented in Lazarian & Pogosyan (2006). A future extension of the code will include fitting with the complete VCS model (Chepurnov et al., 2010, 2015), which can, for example, constrain the turbulent driving scale.

Bispectrum

The bispectrum is the Fourier transform of the three-point correlation function, and is the next-order analog to the power-spectrum. Unlike the power-spectrum, the bispectrum includes phase information, allowing for correlations between different spatial frequencies to be explored. Burkhart et al. (2009) explored how these correlations change in different MHD regimes and later extended the study to consider the HI column density of the SMC (Burkhart et al., 2010). Quantitative comparisons between bispectra are more difficult than with the power-spectrum, which can be characterized only by its index. The bispectrum is a complex quantity and cannot generally be reduced to a 1D representation. Instead, the bicoherence—a real-valued normalized quantity that represents phase coupling—can be calculated. The TurbuStat implementation includes both quantities, using the bicoherence definition from Hagihira et al. (2001). Calculating the bispectrum for even a small image is expensive and time-consuming. Our implementation uses Monte Carlo sampling, with user input on the number of samples, to compute the bispectrum for each combination of wavenumbers. A future extension to avoid this sampling is to utilize the multi-pole expansion introduced recently by Portillo et al. (2018).

Wavelet Transform

Gill & Henriksen (1990) measure the amount of structure as a function of spatial scale by taking the sum of positive values in a wavelet decomposition of a two-dimensional image. This gives a transform that is similar to the structure function (e.g., Miesch & Bally, 1994). The wavelet implementation in TurbuStat is similar to the algorithm from Gill & Henriksen (1990), though we introduce a change in the normalization of the wavelet kernels (Appendix A).

Delta-Variance

Similar to the Gill & Henriksen (1990) wavelet transform777Also see Zielinsky & Stutzki (1999)., the delta-variance technique is based on a wavelet decomposition, and is an extension of Allan Variance for one-dimensional time series (Stutzki et al., 1998; Bensch et al., 2001; Ossenkopf et al., 2001). The delta-variance characterizes the image structure at a set of spatial scales by calculating the variance in the wavelet decomposition. An extension of this method for irregularly-shaped observational maps was developed by Ossenkopf et al. (2008a, b), which we have implemented in TurbuStat (see Appendix B for a comparison with the IDL code provided by these authors).

Principal Component Analysis (PCA)

PCA is a general dimensionality reduction procedure that identifies correlated components based on an orthogonal decomposition of a covariance matrix. Heyer & Schloerb (1997) first applied PCA to a spectral-line data cube by creating a covariance matrix of spectral channels in a data cube, decomposing that covariance matrix, and using the eigenvectors and eigenimages to recover characteristic spectral and spatial scales in the data. Combining these scales over the first $N$ eigenvalues produces a size-line width relation for the data, whose index can be related to the theoretically expected turbulent regimes. This technique was further developed in Brunt & Heyer (2002a, b) and Roman-Duval et al. (2011), with an extension for measuring anisotropy (Heyer et al., 2008). An analytic model is presented in Brunt & Heyer (2013). TurbuStat implements the algorithm described in Brunt & Heyer (2002a, b), including correction factors for the beam size and empirically-derived calibrations (see §4.1.3). The spectral and spatial scales can be fit with orthogonal distance regression or with a Bayesian approach, both of which handle errors in both dimensions.

Spectral Correlation Function (SCF)

The SCF was introduced by Rosolowsky et al. (1999) to relate spatial and spectral similarities of a data cube (see also Padoan et al., 2001, 2003). TurbuStat implements the form from Yeremi et al. (2014), where the statistic is the normalized root-mean-square difference between the cube and its spatially-shifted self. By iterating over a range of spatial shifts in both spectral dimensions, we create a two-dimensional correlation surface whose azimuthally-averaged index has been shown to be sensitive to changes in turbulent properties (Padoan et al., 2003; Muller et al., 2004; Gaches et al., 2015). Our implementation can model the correlation surface in either one- or two-dimensions using a similar approach described for the spatial power-spectrum.

2.3 Analysis of Distributions

The distribution of values within a data set, or portions of a data set, are useful diagnostics in many settings. TurbuStat provides a convenient implementation for fitting probability distribution functions, and descriptions of distribution shapes.

Probability Distribution Functions (PDF)

The most commonly-used analysis technique to describe turbulent properties from observational data products is the PDF. Extensive work on PDFs from simulations (Vazquez-Semadeni, 1994; Ostriker et al., 2001; Kowal et al., 2007; Federrath et al., 2008; Burkhart et al., 2009, 2017, e.g.,) and observations (e.g., Miesch & Scalo, 1995; Burkhart et al., 2010; Lombardi et al., 2015; Imara & Burkhart, 2016; Bialy et al., 2017) has provided a solid framework connecting the PDF to turbulent properties. The TurbuStat PDF implementation was written to emphasize flexibility in treatment and modelling of PDFs. Both images and cubes can be used, and the code can be used to quickly recover properties of the PDF and its empirical cumulative distribution function (ECDF). The implementation utilizes the scipy.stats888docs.scipy.org/doc/scipy/reference/stats.html continuous distributions for modelling, including normal, log-normal, and power-law distributions. A maximum-likelihood estimator is used to fit model distributions to the data.

Statistical Moments

An extension to PDF studies is an analysis of higher-order moments, namely the skewness and kurtosis, which are non-Gaussian indicators. Previous studies have explored skewness and kurtosis for column density, velocity centroid, and line width PDFs of simulations and observations (e.g., Padoan et al., 1999; Kowal et al., 2007; Burkhart et al., 2009, 2013a, 2015a). Burkhart et al. (2010) extend this approach by using a rolling circular filter to create spatial moment maps of the HI in the SMC. This allows for spatial variations in the PDF moments to be explored. The TurbuStat implementation provides both methods.

Tsallis Statistics

The Tsallis distribution was introduced by Tsallis (1988) for describing multi-fractal systems. Esquivel & Lazarian (2010) first introduced Tsallis statistics for modelling ISM turbulence (see also Tofflemire et al., 2011; Burkhart et al., 2013b, 2015a; González-Casanova et al., 2018). Our implementation of Tsallis statistics follows Esquivel & Lazarian (2010) and uses a q-Gaussian distribution to model the difference in a given image as a function of spatial scale.

3 Overview of TurbuStat

TurbuStat contains sub-modules for the computation of the methods, calculating moment arrays and their uncertainties from data cubes, and basic I/O operations for handling FITS files. In this section, we present the methods and other helpful utilities implemented in TurbuStat.

3.1 Package Dependencies

TurbuStat utilizes several packages in the python scientific computing infrastructure, namely built on numpy (Oliphant, 2006–, numpy.org), scipy (Jones et al., 2001–, scipy.org), and matplotlib (Hunter, 2007, matplotlib.org). We use astropy (Astropy Collaboration et al., 2018, astropy.org) for I/O operations, unit handling, convolution, and WCS transformations. The package infrastucture of TurbuStat relies heavily on the astropy testing and documentation infrastucture999github.com/astropy/astropy-helpers. The scikit-image (van der Walt et al., 2014, scikit-image.org) is used for morphological operations, contour finding, and fitting elliptical models. The scikit-learn (Pedregosa et al., 2011, scikit-learn.org) provides a parallelized routine for calculating pairs of distances between data sets, which is required for the Cramer distance metric (Yeremi et al., 2014; Koch et al., 2017, §3.3). Most of the fitting routines in TurbuStat rely on the statsmodels package (Seabold & Perktold, 2010, statsmodels.org), which mimics the linear model fitting in the R programming language101010r-project.org. We also use statsmodels for maximum likelihood estimation for fitting PDFs. The astrodendro111111dendrograms.readthedocs.io package creates a dendrogram to explore hierarchical structure (Goodman et al., 2009; Rosolowsky et al., 2008); TurbuStat creates statistical descriptions of the dendrogram.

TurbuStat also has a few optional dependencies. The spectral-cube (Ginsburg et al., 2019, spectral-cube.readthedocs.io) and radio-beam121212radio-beam.readthedocs.io packages provide convenient methods for handling large data sets and beam manipulation, respectively. The emcee (Foreman-Mackey et al., 2013, dfm.io/emcee) package provides optional MCMC fitting for the size–line width relation in PCA and distribution fitting for PDFs. The corner package (Foreman-Mackey, 2016, github.com/dfm/corner.py) creates convenient summary plots of the emcee sampler. For taking the Fast Fourier Transform (FFT) of large data, the FFTW library (Frigo & Johnson, 2005, fftw.org) through the pyFFTW wrapper (Gomersall, 2016, hgomersall.github.io/pyFFTW) can calculate the FFT in parallel.

3.2 Methods Implementation

The 14 turbulence methods implemented in TurbuStat are implemented using a common framework to facilitate ease-of-use (the methods are presented in §2). Each method is implemented as a python class with a common set of steps:

Input – The data, FITS header and other relevant input information (i.e. beam size, distance to region) are given as inputs when initializing the method class. 2. 2.

Calculation – Functions are defined in the class that performs the analysis. Depending on the complexity of the method, the computing steps are split into multiple parts. For example, the spatial power-spectrum has separate steps to (i) compute the power-spectrum, (ii) fit the 1D power-spectrum, (iii) fit the 2D power-spectrum, and (iv) produce a summary plot and print fit statistics. Running each of these functions in the order given will compute the entire method. This step-by-step method allows for maximum user input when computing the method as the arguments and keyword arguments of each function can be altered. 3. 3.

All-in-one – The multi-step approach described above can be cumbersome and requires remembering each step for computing the method. For ease-of-use, each method includes a run function that runs all of the steps with sensible default settings and optionally returns a figure summarizing the results. This approach is ideal for quickly computing a method for exploratory analyses. Most key settings can be altered in the run function so that users need only use this function for most cases. 4. 4.

Plotting & Summary – Each method has a separate function that returns a summary plot of the method, including fits to the outputs of the method.

Some methods have additional functions defined that return useful information about the method or data. For example, after computing the PDF of a data set, the PDF class has functions for returning the percentile of a given value in the data.

3.3 Distance Metrics

TurbuStat includes distance metrics that use an output of a method to compare two data-sets. The methods implemented in TurbuStat measure properties of the underlying physics in observational data; these distance metrics are one approach for quantifying the difference in their physical properties. In previous works, we have used these distance metrics to find which methods are sensitive to different input parameters in sets of simulations (Yeremi et al., 2014; Boyden et al., 2016; Koch et al., 2017; Boyden et al., 2018). We refer readers to Koch et al. (2017) for a full description of the distance metrics. The TurbuStat documentation also includes tutorials on the use of the distance metrics.

3.4 Data Structure and Utilities

Data Structure

TurbuStat is primarily intended to work with observational data products, namely two-dimensional images and spectral-line data cubes. However, our goal is also to ensure these methods are easily-used with a wide-array of data. As such, the TurbuStat methods accept two main input types: (1) Utilizing the astropy I/O interface, the methods in TurbuStat expect a FITS HDU as input. (2) Data can also be passed as a numpy array when converting to the FITS format is prohibitive. These inputs still require a FITS header be provided (see below). Data types from the spectral-cube package may also be used. The FITS header is required by most TurbuStat methods for converting pixel scales into sky coordinates or the spectral dimension in a data cube.

Generating FITS headers

Non-observational data, such as simulated observations, may not be saved with mock WCS information included. For these cases, TurbuStat includes utility functions to generate a FITS header or HDU for the data.

Spectral Moments

Some methods in TurbuStat require a data cube while others need a two-dimensional image representing some property of the data (i.e. integrated intensity or column density). TurbuStat includes utilities for calculating two-dimensional moment images from a data cube—namely the zeroth (integrated intensity), first (centroid) and second (line width) moments—using the spectral-cube package, and saving the images as FITS files. Uncertainty maps for the moments can also be calculated, which are useful for down-weighting noisy regions in some methods.

3.5 Generating fBM images and data cubes

Fractional Brownian Motion (fBM) fields are useful, highly-simplified versions of a turbulent field. They are created by setting the power-law amplitude and randomly-drawing phases in the Fourier domain before transforming to real space. TurbuStat includes routines for creating 2D or 3D fBM fields and routines for creating mock optically-thin HI data cubes given a 3D density and velocity field.

Two-dimensional fBM images can be created with a given power-law index, ellipticity (anisotropy), and angle of ellipticity. These routines are primarily used in TurbuStat to test methods but also provide useful examples (§4.2).

Three-dimensional fBM fields can also be generated, though the current implementation does not include the option for creating anisotropic fields.

Utilizing the aforementioned three-dimensional fields, TurbuStat can generate mock optically-thin HI data cubes from a given set of density and velocity fields. These mock data cubes are another useful tool for testing, and an example is shown in §4.1.

3.6 Additional Features

TurbuStat has a number of additional features that we briefly list here. We note that the use of these features is well-described in the documentation with complete examples.

Beam corrections — Most astronomical imaging, particularly in the radio and submillimetre, are over-sampled relative to the area of a resolution element, or the beam size. This leads to systematic correlations on scales of order the beam size or smaller that affects most turbulence methods that utilize spatial information. For example, the spatial power-spectra will steepen on scales similar to the beam size, approaching the power-spectrum of the beam. TurbuStat includes routines for correcting the beam response in power-spectrum methods (SPS, MVC, VCA) that rely on the radio-beam package. The PCA implementation includes the beam correction described in Brunt & Heyer (2002a). For other spatial methods where the transform remains in real-space (wavelet, delta-variance, statistical moments, SCF), we recommend closely examining the response on scales small or near the beam size, and possibly avoiding those region when fitting. 2. 2.

Apodizing kernels — The power-spectrum methods (SPS, MVC, VCA) use an FFT that will exhibit the Gibbs phenomenon (ringing) when there is signal at the edge of the image. The edges of an image can be tapered to avoid this ringing by using an apodizing kernel. TurbuStat includes several options, with a tutorial demonstrating their effect on the power-spectrum, based on the routines implemented in photutils (Bradley et al., 2019, photutils.readthedocs.io). 3. 3.

Parallelized FFTs — Computing the fast-fourier transform (FFT) is a bottleneck when running FFT-based methods (SPS, MVC, VCA, wavelets, Delta-variance) on large data sets. To speed the FFT up, pyFFTW (Gomersall, 2016, hgomersall.github.io/pyFFTW) can optionally be used to run the FFT in parallel. 4. 4.

Segmented Linear Model — TurbuStat implements a segmented linear model described in Muggeo (2003) that fits for a piece-wise linear model and the position of the breaks. This model is used by default for the VCS, and can be optionally used for all power-spectrum-based methods. For example, this model can be used to constrain break-points in spatial power-spectra that are related to a galaxy’s disk scale height (e.g., Combes et al., 2012). 5. 5.

2D Elliptical Power-law Model — TurbuStat includes an elliptical power-law model, adapted from Tessore & Metcalf (2015, Eqs. $3\mbox{--}5$ ), for fitting 2D power-spectra and constraining anisotropy. The model is defined by three parameters: the power-law index, the ellipticity, and the elliptical angle. Ellipticity parameterizes anisotropy and is defined such that [math] is infinitely anisotropic and $1$ is isotropic. The elliptical angle is the angle between the $x$ -axis and the direction of the anisotropy. An example of this model is shown in §4.2.2. The TurbuStat implementation fits the logarithm of the 2D power-spectrum and estimates uncertainties using residual bootstrapping.

4 TurbuStat Examples

This section presents examples of the methods in TurbuStat with idealized synthetic observations (§3.5). We highlight the code’s ability to recover expected parameters for methods that utilize data cubes (§4.1) and two-dimensional images (§4.2). Scripts to reproduce these examples are available at https://github.com/Astroua/TurbuStat/tree/master/Examples.

4.1 Examples with Position-Position-Velocity Cubes

4.1.1 Example Cubes

For the following examples, we generate four idealized spectral-line data cubes. These cubes are generated from three-dimensional fBM fields with a shape of $256^{3}$ and an index of $-4$ for the density and velocity fields. We assume an isotropic velocity and so only generate one component of the velocity. The velocity field has a dispersion of $10$ ${\rm km\,s^{-1}}$ , and the density field has a dispersion of $1$ cm*-3*. The set of four fields differ only in the random seed used to generate their phases.

The density fBM field is not positive definite and must be altered to ensure that it is. Many approaches have been proposed to create a positive definite field from a fBM field, including taking the absolute value (Stutzki et al., 1998), taking the exponential to generate a log-normal distribution (Brunt & Heyer, 2002a; Ossenkopf et al., 2006; Roman-Duval et al., 2011), subtracting the minimum value of the field (Miville-Deschênes et al., 2003) or some multiple of the field’s standard deviation (Ossenkopf et al., 2006). We mimic the latter approach by adding the standard deviation of the density field to itself and setting values that remain negative to zero. This distorts the index of the density field’s power-spectrum index, however, the measured indices vary by less than $0.1$ from the original index of $-4$ .

We then generate four spectral-line data cubes of HI emission assuming optically-thin conditions and a temperature of 100 K (the thermal velocity dispersion $\sigma_{\rm therm}=0.76$ ${\rm km\,s^{-1}}$ ). We set the velocity channel width to be $0.2$ ${\rm km\,s^{-1}}$ , and thus expect the spectral line profiles to be smooth. Adopting a thermal line width is necessary for smoothing away “shot noise” on small scales (Lazarian et al., 2001). Figure 1 shows the integrated intensity images of the four cubes.

We note that these are small cubes that have been generated for these examples. We have chosen to do this to make reproducing these examples computationally inexpensive. However, the choice to use small cubes does impose limitations. The aforementioned “shot noise” that arises from having a finite number of emitters along the line-of-sight is a significant problem here (Lazarian et al., 2001). Esquivel et al. (2003) and Chepurnov & Lazarian (2009) demonstrate that this numerical effect drastically affects the recovered indices for the VCA and VCS. By selecting a steep field index ( $-4$ ) and by smoothing the data cubes with the thermal line width ( $\sigma_{\rm therm}=0.76$ ${\rm km\,s^{-1}}$ ), we ensure that velocity slices remain close to the thin VCA regime on scales larger than $2$ pixels (based on Eqns. 12–14 from Esquivel et al., 2003).

We also note that these idealized data cubes have no density-velocity correlations (Esquivel et al., 2007) and are not affected by absorption or optical-depth effects (Lazarian & Pogosyan, 2004; Burkhart et al., 2013a).

4.1.2 Velocity Channel Analysis

Lazarian & Pogosyan (2000) predict that the spatial power-spectrum index will change as a function of the velocity channel width due to differences in the influence of the underlying velocity and density fields. As the channel width increases, velocity fluctuations are averaged out and the power-spectrum index approaches the index of the density field. We have chosen the VCA as an example because the analytic relation for the index with spectral channel width provides a good test case to ensure the TurbuStat implementation, and the mock data cube creation, is correct.

There are three regimes for the VCA index described by Lazarian & Pogosyan (2000), (1) the “thin” velocity regime dominated by velocity fluctuations: $m=-3-(\gamma_{v}+3)/2$ , where $m$ is the power-spectrum index and $\gamma_{v}$ is the velocity field index; (2) the “thick” velocity regime where most velocity fluctuations have been averaged over: $m=-3+(\gamma_{v}+3)/2$ ; and (3) the “very thick” velocity regime where all velocity channels have been integrated over: $m=\gamma_{n}$ , where $\gamma_{n}$ is the density field index. Note that these relations are for the steep density regime ( $\gamma_{n}<-3$ ); see Lazarian & Pogosyan (2000) for the shallow density regime. For the $\gamma_{v}=\gamma_{n}=-4$ for these examples, we expect $m_{\rm thin}=-2.5$ , $m_{\rm thick}=-3.5$ , and $m_{\rm verythick}=-4$ , which are shown as horizontal lines in Figure 2.

Using TurbuStat’s VCA and spatial power-spectrum implementations, we fit one- and two-dimensional power-law models to the power-spectra as a function of channel width. Figure 2 shows the recovered indices of the four example cubes. For thin velocity channels, the recovered indices are $\sim\ -2.6$ and do not approach the expected $-2.5$ in the thin regime due to shot noise and smoothing from the thermal line width, shown with the blue vertical line. For larger channel widths, we recover the expected indices in the thick and very thick regimes. The two-dimensional power-law indices are moderately steeper in the thick velocity regime as these fits are dominated by the larger number of samples at large spatial frequencies. The variation in indices between the four cubes is small ( $<0.1$ ).

4.1.3 Principal Component Analysis

We also demonstrate our implementation of PCA on the four example cubes. Adopting a minimum eigenvalue of $0.001$ to avoid numerical noise in the PCA decomposition, we calculate the spatial and spectral scales and fit the size-line width relation for the four cubes, shown in Figure 3. For this example, we fit the size-line width relations with orthogonal distance regression131313Implemented in scipy..

For a velocity field index of $-4$ , we expect the size-line width relation to have an index of $1/2$ . The recovered indices are consistently higher than $1/2$ , which we expect is due to to a small number of eigenvalues containing useful information, and three of the cubes have indices consistent within the uncertainty. The fourth cube has a smaller index—closer to the expected index—than the other cubes which may be due to the autocorrelation surface of the first eigenimage not containing the $1/e$ contour used to find the spatial scale. This issue has been noted in other PCA works (e.g., Roman-Duval et al., 2011). We also note that no correction factor has been used here, as is typically done to relate the measured index to the index of the energy spectrum (Brunt & Heyer, 2002a; Roman-Duval et al., 2011). The aforementioned works impose a log-normal distribution on the density fBM field, which is not done here, and the applicability of these correction factors may not be suitable for this case. However, TurbuStat includes the option to apply the correction factor from Brunt & Heyer (2002a).

4.2 Examples with two-dimensional images

We demonstrate some of the methods which apply to two-dimensional images in this section, namely the recovery of known indices from fBM images and fitting elliptical power-law models to constrain anisotropy.

4.2.1 Image Index Recovery

We generate a series of two-dimensional fBM images with indices ranging from $\beta=0.5\mbox{--}4$ and different random seeds to demonstrate how well the delta-variance, spatial power-spectrum, and wavelets recover the indices. Each of these methods provides complementary information based on their fitted slopes. The delta-variance and wavelet transforms are similar to a second-order structure function and their measured slopes are related to the power-spectrum index with $\beta=m_{\rm delta-variance}+2$ and $\beta=\left(m_{\rm wavelet}+1\right)/2$ , respectively.

Figure 4 shows the percent deviation of the measured index from the actual index across the range of fBM images. The spatial power-spectrum recovers the correct index in each case, while the index from the delta-variance deviates by $<1\%$ . The index from the wavelet transform, however, has larger systematic deviations that vary across the range of fBM image indices shown. These deviations suggest that the delta-variance and spatial power-spectrum provide more accurate results than the wavelet transform.

4.2.2 Modelling spatial anisotropy

There is significant interest in studying the connection between intensity and velocity anisotropy with magnetic field structure in the ISM (e.g., Esquivel & Lazarian, 2005; Burkhart et al., 2014; Kalberla & Kerp, 2016; Kandel et al., 2017). In this example, we demonstrate TurbuStat’s implementation of a two-dimensional elliptical power-law model for fitting power-spectra and other two-dimensional surfaces. The model is described in §3.6.

The left panel of Figure 5 shows a fBM field with a slope of $-3$ and ellipticity of $0.4$ 141414Defined such that an ellipticity of $1.0$ is isotropic and $0.0$ is infinitely anisotropic. oriented $60\arcdeg$ above the $x$ -axis in the figure. Using the spatial power-spectrum, we find the two-dimensional power-spectrum shown in the right panel, which highlights the anisotropy of the structure. Contours of the fitted elliptical power-law are shown with solid lines in the figure. The model correctly recovers the aforementioned fBM image parameters.

This model can be used with other power-spectrum-based methods in TurbuStat, including MVC and VCA, and the SCF correlation surface to constrain anisotropy in images.

5 Conclusions

We introduce TurbuStat, a Python package for turbulence statistics. Currently, the package includes 14 methods presented in the literature for recovering turbulent properties from ISM observations. This paper describes the capabilities of the implementations and other utilities in the package. We also present a few examples (§4) that demonstrate the code’s ability to recover expected properties. We note that additional turbulence data sets in FITS format can be obtained online at www.mhdturbulence.com as part of the Catalog for Astrophysical Turbulence Simulations (CATS) project. CATS also includes spectral line data compatible with TurbuStat.

While this version of TurbuStat includes many methods for recovering turbulent properties, there are a number of other methods that we hope to include in the future. Some examples include: structure functions (e.g., Boldyrev et al., 2002; Padoan et al., 2003), the phase coherence index (Burkhart & Lazarian, 2016), the brightness distribution index (Sawada et al., 2012), the velocity gradient technique (e.g., Yuen & Lazarian, 2017; Lazarian et al., 2018), wavelet-based cross-correlation analysis (Arshakian & Ossenkopf, 2016), and complete VCS modelling (Chepurnov et al., 2010, 2015).

TurbuStat is open-source151515github.com/Astroua/TurbuStat and has thorough documentation and tutorials available161616turbustat.readthedocs.io. We welcome feedback, recommendations, and contributions from the community to improve the on-going development of TurbuStat.

We thank Caleb Ward for important contributions to the code in its early development. We are grateful for feedback and issues reported by Dario Colombo, Jesse Feddersen, Simon Glover, Jonathan Henshaw, Andrés Izquierdo, and Sac Medina. We thank the anonymous referee for their careful reading of the manuscript and comments. EWK is supported by a Postgraduate Scholarship from the Natural Sciences and Engineering Research Council of Canada (NSERC). EWK and EWR are supported by a Discovery Grant from NSERC (RGPIN-2012-355247; RGPIN-2017-03987). This research was enabled in part by support provided by WestGrid (www.westgrid.ca), Compute Canada (www.computecanada.ca), and CANFAR (www.canfar.net).

Appendix A Issues with Wavelet Normalization

When implementing the wavelet transform from Gill & Henriksen (1990), we found that similar quantitative results from Gill & Henriksen (1990) could only be reproduced using a Mexican Hat kernel that was not normalized. With a normalized kernel, the slope is decreased by $2$ compared to the slopes reported by Gill & Henriksen (1990). This difference is due to the Mexican hat wavelet’s close relation to the Laplacian of a Gaussian kernel. The Gaussian kernel carries units of $\mathrm{length}^{-2}$ and its response remains constant across all scales (Lindeburg, 1994). Each derivative causes the response to decay by roughly $1/\sigma$ , since the derivative effectively introduces an additional unit of $\mathrm{length}^{-1}$ . To correct for the decaying response, the convolution of the image and Mexican Hat wavelet should be multiplied with $\sigma^{2}$ , restoring the $\mathrm{length}^{-2}$ units of the Gaussian kernel. This normalization accounts for the difference of $2$ we find in the slopes. This approach is known as scale-normalized derivatives and is essential in blob detection algorithms (Lindeburg, 1994).

While the wavelet normalization does not change qualitative results and interpretation from Gill & Henriksen (1990), deviations from a power-law relation can be hidden when an unnormalized kernel is used. Figure 6 shows the wavelet transform for the same dataset with and without normalization (see also Figure 12 in Boyden et al., 2016). The unnormalized and normalized transforms have slopes of $2.50\pm 0.01$ and $0.50\pm 0.01$ , respectively, when fitting on scales from 2.5 to 40 pixels. Of note is the larger deviations in the normalized transform compared to the unnormalized transform on scales larger than 40 pixels.

Our implementation of the wavelet transform in the TurbuStat package allows for the normalization to be disabled to so the Gill & Henriksen (1990) results can be reproduced, though a warning is printed when doing so.

Appendix B Comparison to Delta-Variance IDL Code

The original version of the delta-variance described in Ossenkopf et al. (2008a, b) is publicly available171717hera.ph1.uni-koeln.de/~ossk/Myself/deltavariance.html. Figure 7 shows the delta-variance curve from an fBM image with an index of $-3.0$ , as used in §4.2. The figures shows that the TurbuStat implementation recovers a delta-variance curve whose slope is consistent with the IDL version, within uncertainty. There is an offset between the curves that results from differences in the implementations of the convolution step181818TurbuStat uses astropy’s FFT convolution function (docs.astropy.org/en/stable/convolution).. The offset is constant within uncertainty for all scales and remains the same when tested on fBM images with different indices. Thus, the offset does not affect the recovered slope.

Appendix C Scaling Tests

We summarize a set of performance tests in this section for the 14 statistical methods in TurbuStat. These tests were run with the default settings on synthetic data ranging from sizes of $256^{2}$ to $2048^{2}$ for two-dimensional images, and $256^{3}$ to $2048^{3}$ for spectral-line data cubes. We ran the tests on a compute node with two Intel E5-2683 v4 “Broadwell” processor and 512 GB191919This is a “large 512 G” on the cedar cluster (docs.computecanada.ca/wiki/Cedar).; the times are based on running on a single processor.

Figure 8 summarizes these tests by showing the five methods with the highest memory usage or run time. We note that the memory tests do not include the memory-usage of the data products. Machine-readable tables are available for all methods as online material.

Scripts to reproduce these scaling tests are available at https://github.com/Astroua/TurbuStat/tree/master/Examples.

Bibliography129

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Armstrong et al. (1995) Armstrong, J. W., Rickett, B. J., & Spangler, S. R. 1995, Ap J, 443, 209, doi: 10.1086/175515 · doi ↗
2Arshakian & Ossenkopf (2016) Arshakian, T. G., & Ossenkopf, V. 2016, A&A, 585, A 98, doi: 10.1051/0004-6361/201525899 · doi ↗
3Astropy Collaboration et al. (2018) Astropy Collaboration, Price-Whelan, A. M., Sipőcz, B. M., et al. 2018, AJ, 156, 123, doi: 10.3847/1538-3881/aabc 4f · doi ↗
4Beaumont et al. (2013) Beaumont, C. N., Offner, S. S. R., Shetty, R., Glover, S. C. O., & Goodman, A. A. 2013, Ap J, 777, 173, doi: 10.1088/0004-637X/777/2/173 · doi ↗
5Bensch et al. (2001) Bensch, F., Stutzki, J., & Ossenkopf, V. 2001, A&A, 366, 636
6Bialy et al. (2017) Bialy, S., Burkhart, B., & Sternberg, A. 2017, Ap J, 843, 92, doi: 10.3847/1538-4357/aa 7854 · doi ↗
7Boldyrev et al. (2002) Boldyrev, S., Nordlund, Å., & Padoan, P. 2002, Ap J, 573, 678, doi: 10.1086/340758 · doi ↗
8Boyden et al. (2016) Boyden, R. D., Koch, E. W., Rosolowsky, E. W., & Offner, S. S. R. 2016, Ap J, 833, 233, doi: 10.3847/1538-4357/833/2/233 · doi ↗