Marvin: A Toolkit for Streamlined Access and Visualization of the   SDSS-IV MaNGA Data Set

Brian Cherinka; Brett H. Andrews; Jos\'e S\'anchez-Gallego; Joel; Brownstein; Mar\'ia Argudo-Fern\'andez; Michael Blanton; Kevin Bundy; Amy; Jones; Karen Masters; David R. Law; Kate Rowlands; Anne-Marie Weijmans; Kyle; Westfall; Renbin Yan

arXiv:1812.03833·astro-ph.IM·July 31, 2019

Marvin: A Toolkit for Streamlined Access and Visualization of the SDSS-IV MaNGA Data Set

Brian Cherinka, Brett H. Andrews, Jos\'e S\'anchez-Gallego, Joel, Brownstein, Mar\'ia Argudo-Fern\'andez, Michael Blanton, Kevin Bundy, Amy, Jones, Karen Masters, David R. Law, Kate Rowlands, Anne-Marie Weijmans, Kyle, Westfall, Renbin Yan

PDF

TL;DR

Marvin is a comprehensive toolkit that simplifies access, visualization, and analysis of the large, complex SDSS-IV MaNGA galaxy survey data set through a Python package, API, and web interface, addressing data volume and accessibility challenges.

Contribution

It introduces a new toolkit that streamlines data handling and analysis for the MaNGA survey, enabling more efficient scientific exploration of galaxy data.

Findings

01

Facilitates easier data access and visualization.

02

Reduces time and effort in data curation and analysis.

03

Supports extensions and future development.

Abstract

The Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, one of three core programs of the fourth-generation Sloan Digital Sky Survey (SDSS-IV), is producing a massive, high-dimensional integral field spectroscopic data set. However, leveraging the MaNGA data set to address key questions about galaxy formation presents serious data-related challenges due to the combination of its spatially inter-connected measurements and sheer volume. For each galaxy, the MaNGA pipelines produce relatively large data files to preserve the spatial correlations of the spectra and measurements, but this comes at the expense of storing the data set in a coarsely-chunked manner. The coarse chunking and total volume of the data make it time-consuming to download and curate locally-stored data. Thus, accessing, querying, visually exploring, and performing statistical analyses across the whole…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Marvin: A Toolkit for Streamlined Access and Visualization of the SDSS-IV MaNGA Data Set

Brian Cherinka11affiliation: Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218, USA , Brett H. Andrews22affiliation: Department of Physics and Astronomy and PITT PACC, University of Pittsburgh, 3941 O’Hara Street, Pittsburgh, PA 15260, USA , José Sánchez-Gallego33affiliation: Department of Astronomy, Box 351580, University of Washington, Seattle, WA 98195, USA , Joel Brownstein44affiliation: Department of Physics and Astronomy, University of Utah, 115 S 1400 E, Salt Lake City, UT 84112, USA , María Argudo-Fernández55affiliation: Centro de Astronomía (CITEVA), Universidad de Antofagasta, Avenida Angamos 601 Antofagasta, Chile 66affiliation: Chinese Academy of Sciences South America Center for Astronomy, China-Chile Joint Center for Astronomy, Camino El Observatorio, 1515, Las Condes, Santiago, Chile , Michael Blanton77affiliation: Department of Physics, New York University, 726 Broadway, New York, NY 10003, USA , Kevin Bundy88affiliation: University of California Observatories, University of California Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA , Amy Jones1313affiliation: Department of Physics and Astronomy, University of Alabama, Tuscaloosa, AL 35487, USA , Karen Masters1010affiliation: Department of Physics and Astronomy, Haverford College, 370 Lancaster Avenue, Haverford, Pennsylvania 19041, USA 1111affiliation: Institute of Cosmology & Gravitation, University of Portsmouth, Dennis Sciama Building, Portsmouth, PO1 3FX, UK , David R. Law11affiliation: Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218, USA , Kate Rowlands99affiliation: Department of Physics and Astronomy, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA , Anne-Marie Weijmans1212affiliation: School of Physics and Astronomy, University of St Andrews, North Haugh, St Andrews, KY16 9SS, UK , Kyle Westfall88affiliation: University of California Observatories, University of California Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA , Renbin Yan1414affiliation: Department of Physics and Astronomy, University of Kentucky, 505 Rose St., Lexington, KY 40506-0057, USA

[email protected]

Abstract

The Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, one of three core programs of the fourth-generation Sloan Digital Sky Survey (SDSS-IV), is producing a massive, high-dimensional integral field spectroscopic data set. However, leveraging the MaNGA data set to address key questions about galaxy formation presents serious data-related challenges due to the combination of its spatially inter-connected measurements and sheer volume. For each galaxy, the MaNGA pipelines produce relatively large data files to preserve the spatial correlations of the spectra and measurements, but this comes at the expense of storing the data set in a coarsely-chunked manner. The coarse chunking and total volume of the data make it time-consuming to download and curate locally-stored data. Thus, accessing, querying, visually exploring, and performing statistical analyses across the whole data set at a fine-grained scale is extremely challenging using just FITS files. To overcome these challenges, we have developed Marvin: a toolkit consisting of a Python package, Application Programming Interface (API), and web application utilizing a remote database. Marvin’s robust and sustainable design minimizes maintenance, while facilitating user-contributed extensions such as high level analysis code. Finally, we are in the process of abstracting out Marvin’s core functionality into a separate product so that it can serve as a foundation for others to develop Marvin-like systems for new science applications.

††software: Anaconda (https://anaconda.org/anaconda/python), Astropy (Astropy Collaboration et al., 2013; The Astropy Collaboration et al., 2018, http://www.astropy.org), Bootstrap (https://getbootstrap.com), Browserstack (https://www.browserstack.com) brain (https://github.com/sdss/marvin_brain), Coveralls (https://coveralls.io/), D3 (https://d3js.org), DyGraphs (http://dygraphs.com), FITS (Pence et al., 2010), Flask (http://flask.pocoo.org), Flask-Login (https://flask-login.readthedocs.io), Flask-JWT-Extended (https://flask-jwt-extended.readthedocs.io), fuzzywuzzy (https://github.com/seatgeek/fuzzywuzzy), git (https://git-scm.com), Highcharts (https://www.highcharts.com), Jinja2 (http://jinja.pocoo.org/docs), JQuery (https://jquery.com), Jupyter (Kluyver et al., 2016, http://jupyter.org), Matplotlib (Hunter, 2007, https://doi.org/10.5281/zenodo.61948), networkx (https://networkx.github.io), Nginx (https://www.nginx.com), OpenLayers (https://openlayers.org), pip (https://pypi.org/project/pip), Postgres (https://www.postgresql.org), pytest (https://docs.pytest.org/), Read the Docs (https://readthedocs.org/), requests (http://docs.python-requests.org), rsync (https://rsync.samba.org), sdss-access (https://doi.org/10.5281/zenodo.1410704), sdss-tree (https://doi.org/10.5281/zenodo.1410706), Selenium (https://www.seleniumhq.org), Sphinx (http://www.sphinx-doc.org), SQLAlchemy (https://www.sqlalchemy.org), sqlalchemy-boolean-search (https://github.com/sdss/sqlalchemy-boolean-search), Travis-CI (https://travis-ci.org/), uwsgi (https://uwsgi-docs.readthedocs.io)

\AuthorCallLimit

=5

1 Introduction

Large astronomy collaborations with dedicated facilities pursuing multi-year surveys are producing massive data sets at furious rates. The data sets from the current generation of surveys, such as the Sloan Digital Sky Survey (hereafter SDSS; York et al. 2000; Strauss et al. 2002), require more disk space than is available on personal computers and some moderate-sized institution-level servers. However, the next generation of surveys, such as the Large Synoptic Sky Survey (Ivezić et al., 2008) and the Square Kilometer Array (Braun et al., 2015), will create data sets that will be far too large for all but a few dedicated national-level facilities. The real power of these immense data sets comes from simultaneously leveraging multiple sources of information (e.g., at different wavelengths) about each object, so connecting the relevant data sources for a comprehensive analysis is critical. Since individual users cannot store the data locally and need to access portions of the data remotely, bandwidth is often the primary bottleneck. Speed increases in Internet bandwidth have lagged behind those in computer processors (i.e., Moore’s law; Moore 1965) by 10% (Nielsen, 1998); the effect of this lag has compounded over decades, up to the present, to exacerbate the gap. Consequently, only a subset of the data can be transferred. However, selecting this subset often requires access to the whole data set, which requires remote operations, especially queries.

SDSS was one of the earliest and remains one of the strongest driving forces in astronomy pushing the philosophy of public data releases that make astronomy a leader in open science. Crucially, these data releases are served with robust data distribution systems and come thoroughly documented. These two often-overlooked aspects have lowered the entry barrier and enabled thousands of professional astronomers and many times more public users to take advantage of this powerful data set. Marvin extends this mission by providing code to facilitate data use by professional astronomers, scientists in other fields (e.g., physics, computer science, and statistics), data scientists, citizen scientists, educators, and students.

The current phase (2014–2020) of SDSS, SDSS-IV (Blanton et al., 2017), consists of three simultaneous surveys, including the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA; Bundy et al. 2015) survey. Legacy SDSS (York et al., 2000) took spectra of only the central regions of galaxies (Strauss et al., 2002), whereas MaNGA takes hundreds of spectra per galaxy arranged in a hexagonal grid across the face of the galaxy (Drory et al., 2015), using the SDSS/BOSS spectrographs (Smee et al., 2013) on the SDSS telescope (Gunn et al., 2006). Typically, there are 3 dithered sets of 3 individual exposures offset from each other which are combined into a data cube (Law et al., 2016; Yan et al., 2016, 2016). Thus, each object is not represented by just a single central spectrum, but rather a well-sampled grid of spectra.

Figure 1 illustrates the format of the MaNGA dataset. Each data cube consists of two spatial dimensions and one wavelength dimension. The one-dimensional spectrum at each spatial location can be interpreted in terms of measurements and physical parameters, yielding over 150 two-dimensional maps for each galaxy (Westfall et al. in prep.), including: gas emission lines, stellar absorption features, stellar surface density, star formation rate surface density, stellar velocities, and gas velocities. These maps can then be interpreted in terms of global properties of each galaxy: its mass in stars, its mass in dark matter, its total star formation rate, and other quantities. Marvin and the MaNGA maps for 4824 galaxies will be publicly released as part of Data Release 15 (Aguado et al. 2018).

In addition to its complexity, MaNGA’s data volume is significant. MaNGA will observe over 10,000 galaxies (Law et al., 2015; Wake et al., 2017), more than an order of magnitude larger than previous IFU surveys, such as the Atlas3D (Cappellari et al., 2011), DiskMass (Bershady et al., 2010), and CALIFA (Calar Alto Large Integral Field Area; Sánchez et al. 2012) surveys. All told, the final MaNGA data release will be 10 terabytes or about 1 gigabyte per galaxy in final summary data products, containing: data cubes and row-stacked spectra in log and linear wavelength sampling, derived analysis maps, and model template data cubes. Individual data releases contain multiple analyses of each galaxy, each optimized for different science goals, resulting in multiple versions, e.g different binning schemes, of the data cube and maps. The total volume for all of the MaNGA public data releases will be 35 terabytes due to re-analyses of the same galaxies as the data pipelines improve. Because of these re-analyses, if a given scientific paper is to be replicable, easy access to previous data releases must also be provided.

Further complicating analysis of MaNGA data is its coarsely-chunked storage across separate files for the spectra and derived property maps for each galaxy. Traditionally data are stored this way to optimize for an object-by-object catalog of files. This coarsely-chunked data makes querying on MaNGA’s spatially-resolved data quite difficult without extensive manual preparation of all files and tracking of correct cross-matches, so queries can only easily be done on global properties. Exploratory analysis and visualization are cumbersome with coarsely-chunked data, which is compounded by the disconnected packaging of the spectra and maps. Finally, coarsely-chunked data unnecessarily strains bandwidth and disk space resources because superfluous data need to be transferred and stored. These challenges encourage traditional object-by-object analyses instead of innovative ones that leverage the statistically significant sample size of MaNGA.

This paper presents software to address these challenges. Section 2 describes the initial prototype and its inherent limitations, the core design philosophy of Marvin and the components involved. Section 3 describes the variety of client-based programmatic tools available in Marvin. Section 4 describes the front-facing web portion of Marvin which serves as the exploratory portal of entry for new users. The server-side features and back-end capabilities are discussed in Section 5. Section 6 describes a typical science use case for MaNGA and how Marvin streamlines its implementation. In Section 7, we discuss our current implementation strategy for engaging the long-term sustainability of Marvin. We summarize and discuss the future potential of Marvin in Section 8. Finally, a series of example code and tutorials are provided in the Appendix.

2 Core Design

2.1 The Marvin Prototype

To address the challenge of visually exploring MaNGA data, we developed a prototype version of Marvin that existed as a pure web-application. The prototype displayed optical images, spectra, and property maps for individual galaxies. These visual displays, in conjunction with a basic annotation system, proved useful for quality assessment of an early version of the MaNGA pipelines. The prototype also featured a simple query system and provided links to download the FITS data files.

The design choices for the prototype enabled rapid development, but ultimately limited its utility and sustainability. The images, spectra, and maps were static PNG files, which could not provide the interactive experience required for a complete visual exploration of the complex suite of available parameters. Queries could only be performed on global properties not local (spatially-resolved) ones. Data could only be accessed via large files that contained all of the spectra or property maps for a galaxy, making it impossible to retrieve just the spectrum or a single property of an individual spaxel. Because expanding the feature set of the prototype required creating new static files, the prototype was difficult to extend and time-consuming to maintain.

Furthermore, none of the components in the prototype web-application were usable in a command line form. Users were forced to reinvent the same visual and search tools if they wanted to use them programmatically. Such tools could serve as the basis for and be related to advanced programmatic analysis tools. Every user would end up developing similar tools but within different frameworks, such that each individual’s analysis code would not be interoperable with that of other users.

These limitations of the prototype design failed to address any of the inherent challenges of the MaNGA data set. Thus, a complete redesign and refactor was required to fix these shortcomings, which led to a new design philosophy of Marvin.

2.2 Design Philosophy and Core Components

Marvin’s design philosophy focuses on eliminating the overhead costs and limitations of accessing the large, coarsely-chunked, and incompletely-linked MaNGA data set. Solving these issues enables on-demand data access, interactive visual exploration, minimal downloads, spatially-resolved queries, and statistical analyses at a spaxel-level. Marvin provides a feature-rich framework that serves as the building blocks for user-developed analysis tools that can be contributed back into Marvin to maximize code reuse and accelerate scientific progress.

Marvin is a complete toolkit designed for overcoming the challenges of searching, accessing, and visualizing the MaNGA data. The core design is centered around a few main components:

•

A Multi-Modal Access (MMA) system that handles all data flow paths.

•

An Application Programming Interface (API) based on the Representational State Transfer (REST) architectural style that handles all communication between the client and server.

•

A Brain, a common core package that handles generic functionalities and abstracts common methods needed during data gathering.

•

A programmatic DataModel, that simplifies handling of a large suite of parameters that may differ between data releases and formats.

Marvin combines and builds on top of these core pieces to provide the following additional tools:

•

A suite of interconnected Python tools, all based off a core Python tool with the MMA system built-in, with two main tool types:

–

Data Product Tool: wraps your data products and retrieves specific chunks of data. (e.g., Cube or Maps in §3.1)

–

Query Tool: performs SQL queries against the remote data, with a pseudo-natural language syntax parser to simplify the user input.

•

A Python Interaction class providing a uniform interface to the API, integrated into all the Tools.

•

A web application, built on top of the Tools, for quick data visualization and exploration.

These tools work with each other, allowing for multiple entry points into the data, making it easy for users of various domain expertise (i.e from students to power-users), to access the data using the same suite of tools.

2.3 Multi-Modal Access

In the case of MaNGA, the amount of data produced (the final data release will be of order 10 TB) sits on the boundary of what a user can store and analyze locally with normal computing resources. Future surveys (e.g., the Large Synoptic Survey Telescope) will produce data sets many orders of magnitude larger than MaNGA’s, thus requiring the development of new ways to access data.

One of Marvin’s core design choices is that data access should be abstracted in a way that makes the origin of the data irrelevant to the final user. Marvin accomplishes this goal with a Multi-Modal Access system with a decision tree that defines what access mode to use and the code implementation that executes it. Below we describe the data access modes: opening local files, searching local databases, or making API calls to a remote web server. Each of these data formats carries a series of advantages and disadvantages, but Marvin’s MMA allows users to leverage the advantages while minimizing the disadvantages.

Files (e.g. FITS) provide portable data that can be heavily compressed, and they are the current standard for astronomical data distribution. However, data access can be slow (especially from compressed files), and the data are usually stored in a way that requires a degree of familiarity with the data model. Moreover, doing searches and cross-analyses between multiple targets usually demands accessing a large number of files and keeping a significant amount of data in memory.

Relational databases solve some of these problems by storing the whole data set in an optimized and well-indexed way, which enables running complex queries efficiently, and provides quicker data access in most situations. In this case, the main disadvantages are the large size of a monolithic database (comparable to downloading all of the uncompressed files that compose the data set) and the difficulty of learning how to access data, especially compared to access via files.

Finally, data can be stored in servers (either as files or in databases) and accessed remotely via an API call that returns only the subset of data requested in the call. APIs are convenient for the user since they obviate the need to download data files to a local computer and can be used to abstract the data model. Their main downsides are that the internet is required to access the data and that applications that require access to large amounts of data can be slow to run.

Marvin Tools (see Section 3) include implementations that allow loading data from files, from a database, or via a series of API calls. However, once the data has been loaded, the Tools behave the same and produce the same results regardless of the data origin.

Figure 2 shows the decision tree followed by each tool to decide from where to load data. If the MMA is being run in “local” mode and a target identifier is provided (a plate-IFU or mangaid, which define a unique observation or a single target, respectively; see Yan et al. 2016), the code checks if a database is available and, if so, loads the data using it. If a database cannot be found, the default path file corresponding to that identifier and data release (generated as described in Section 2.3.1) is used, if the file exists locally. Alternatively, a file path can be passed to the MMA, in which case that file will be used.

In “remote” mode, an API call is done to a remote server with the target identifier and the data release as inputs; the remote server uses the same MMA in “local” mode to access the necessary data from a database containing the complete MaNGA data set and returns them.

The default mode for Marvin is “auto” mode, which tries to access the data in “local” mode first and will try in “remote” mode upon failure. This order prioritizes local over remote data access because the former is usually faster, while seamlessly transitioning to the latter if the data is not available locally. See Appendix A.1 for an illustration of accessing an object with the MMA under different inputs and data origins.

In principle, it would also be possible to set up a system with a complete MaNGA database and use Marvin to access it locally. While setting up such a system would be non-trivial from a technical standpoint, there are situations in which it could be advantageous (e.g., in the case of an institution that wants to provide a local mirror of the MaNGA data set).

Figure 3 shows a high level overview of the user interface in Marvin. The user has two main access points: the local Marvin client or the web browser interface. While the browser interface communicates directly with the Marvin server, the MMA operating on the client-side decides whether to access data locally or remotely via API calls to the Marvin server. The Marvin server (following the MMA decision tree) first attempts to access data from a local database and will fall back to files when needed.

2.3.1 Abstract Path Generation

A machine-aware approach to file locations requires generalizing the ability to generate full paths to these files and removing all traces of the base filesystem root directory. In this way, Marvin can be agnostic to whether it is installed on a user’s laptop or an SDSS host server. This layer of functionality is provided by the publicly available sdss-access (Cherinka et al., 2018) and sdss-tree (Cherinka & Brownstein, 2018) software packages. sdss-tree provides the local system environment variable setup, allowing tools to understand the relative locations of data, while sdss-access provides a convenient way of navigating local and remote file paths. Paths to files are defined in a template format, specified with a shortcut name, plus a series of keyword arguments that specify variables within the filenames. This enables users to specify a robust path to any file simply by adjusting the input variable parameters. These packages are designed around relative path definitions, allowing a user to replicate a full environment by changing the definition of the base path. With a single root environment variable set by the user, these packages automatically create a local filesystem structure that mimics the filesystem of the SDSS Science Archive Server hosted at the University of Utah on which the full MaNGA data archive is stored.

For a given file, sdss-access has the ability to look up the full system path, generate the corresponding HTTP URL, and generate a remote access path for use with rsync. This flexibility allows Marvin to know precisely where to look for a given file locally and also quickly switch to a remote host when needed. sdss-access has the ability to download files from an SDSS server using multi-stream rsync, a technology derived from the SDSS Transfer Product (Weaver et al., 2015). This enables fast and robust file transfers, which are particularly helpful for speeding up downloads of many files. The hierarchy of files is created identically at the destination. As paths are added to the service, sdss-access eliminates redundant downloading by first checking for the existence of the file locally and only downloads files that do not currently exist.

2.4 Marvin’s Brain

Marvin’s Brain is a core product that Marvin relies on and contains the management and overhead needed for regular tasks. There are many skills, tasks, and functionalities that have become more common, and are often required, to interact with modern astronomical data interfaces. Examples include items such as constant management of local paths to data files, learning how to write HTTP requests for accessing data served remotely, learning SQL to access data from databases, or even learning how to write web applications to serve data to others. These kinds of tasks often end up as logistical overheads that can be frustrating for end users, as they take repeated time to learn or implement and become barriers. These barriers can impede users’ ability to do their science, which, at best, delays scientific discovery and, at worst, prevents accessing the necessary data altogether.

The primary design goal with the development of Marvin was to abstract away these overheads, and provide a framework that automatically handles much of this management. While Marvin is software specific to the MaNGA data set, many of these overheads are often independent of the type of data being served or accessed. To facilitate easier access and potential reusability of these features for other projects, we have placed these kinds of features into an additional core product, called the Brain, which Marvin depends on. Figure 4 shows the relationship between Marvin and its Brain. Marvin’s Brain (shown in blue) exists as base classes that sit underneath all of the components within Marvin. These classes act as templates that can be reused and customized for different applications. Our aim is to continue to migrate existing common Marvin features into the Brain so others can utilize the same tools.

2.5 DataModel

Marvin programmatically implements the unique MaNGA data model for each data release to abstract the Data Products for the MMA system and users. The MMA system relies on the DataModel to produce the same Data Product (e.g., Cube or Maps) from the correct data release regardless of whether it was instantiated from a FITS file, a database, or via the API. This abstraction makes scientific reproducibility much easier. It also enables users to programmatically navigate the Data Products without having to refer to the documentation. Marvin simplifies the data model for users by utilizing FuzzyWuzzy, a fuzzy string matching algorithm, to fix incorrect but unambiguous user input (e.g., “gflux ha” maps to “emline_gflux_ha_6564”). The DataModel is available as a standalone navigable object allowing access to the content and format of all MaNGA deliverables from a single location. Additionally, individual data models are attached to every relevant Marvin Tool, providing an internal lookup that all Tools use for self-consistency, making them robust against any changes to the underlying data files. As the format of the FITS files changes periodically between data releases, the structure of the Tools remains the same as the data model provides that intermediate go-between. Finally, the documentation for the data model is automatically generated (see Section 7.3) for reference.

3 Programmatic Tools

Marvin provides a programmatic interaction with the MaNGA data to enable rigorous and repeatable science-grade analyses beyond simply visualizing the data. These tools come in the form of a Python package that provides convenience classes and functions that simplify the processes of searching, accessing, downloading, and interacting with MaNGA data, selecting a sample, running user-defined analysis code, and producing publication quality figures. Marvin Tools are separated into two main categories: Data Product Tools and Query Tools. The Data Product Tools are object-based and are constructed around classes that correspond to different levels of MaNGA data organization. The Query Tools are search-based and are designed to provide the user the ability to remotely query the MaNGA galaxy data set and retrieve only the data they want. Marvin also provides a built-in data model, which describes the science deliverables for every data release of Marvin. Overall, these tools allow for easier access to the data without knowing much about the data model, by seamlessly connecting all the MaNGA data products, eliminating the need to micromanage a multitude of files. Figure 5 shows a visual guide to all our tools, and highlights the interconnectivity between them.

3.1 Galaxy Tools

These tools cover four main classes, Cube, RSS, Maps, and ModelCube, that are associated with the analogous Data Reduction Pipeline (DRP; Law et al. 2016) and Data Analysis Pipeline (DAP; Westfall et al. in prep.) data products—namely, multi-dimensional data cubes, row-stacked spectra, derived analysis maps, and model cubes. The four main tools all inherit from a common core object, thus sharing much of their functionality and logic, such as the MMA. These tools are designed to do more than simply wrap and serve the underlying data and metadata contained in FITS files. Their goal is to streamline the users’ interaction with that data and simplify common but often non-trivial tasks associated with handling the data. Via these tools, all data is delivered as Astropy Quantitys, with attached variance, mask, and any associated available properties. With Quantity variance and mask tracking, this enables robust and consistent arithmetic between any of the DAP Maps. Each tool has a built-in data model describing the format and content of the data it delivers. This data model also provides convenient top-level access to all properties available, with autocomplete navigation. Any given tool has convenient access to associated data products, as well as easy download capability for any data accessed remotely.

Features or functionalities that are common to multiple tools are designed as Python Mixin objects. These objects are designed as isolated pieces of code that can be “mixed in” with any other tool, giving that tool access to its parameters. Access to the NASA-Sloan Atlas (NSA) catalog (Blanton et al., 2011)111https://www.sdss.org/dr15/manga/manga-target-selection/nsa/ and the DAP summary file for instance are implemented in this manner. Extracting spaxels within a specified aperture is a common functionality delivered to all tools as a Mixin.

There are additional tools that are not associated with a particular MaNGA data file but instead map to objects related to the MaNGA data. These tools behave in much the same way as the core tools. They utilize the MMA, allow for remote file downloading, and are seamlessly integrated with each other. The

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Astropy Collaboration et al. (2013) Astropy Collaboration, Robitaille, T. P., Tollerud, E. J., et al. 2013, A&A, 558, A 33
2Baldwin et al. (1981) Baldwin, J. A., Phillips, M. M., & Terlevich, R. 1981, PASP, 93, 5
3Belfiore et al. (2016) Belfiore, F., Maiolino, R., Maraston, C., et al. 2016, MNRAS, 461, 3111
4Bershady et al. (2010) Bershady, M. A., Verheijen, M. A. W., Swaters, R. A., et al. 2010, Ap J, 716, 198
5Blanton et al. (2017) Blanton, M. R., Bershady, M. A., Abolfathi, B., et al. 2017, AJ, 154, 28
6Blanton et al. (2011) Blanton, M. R., Kazin, E., Muna, D., Weaver, B. A., & Price-Whelan, A. 2011, AJ, 142, 31
7Braun et al. (2015) Braun, R., Bourke, T., Green, J. A., Keane, E., & Wagg, J. 2015, Advancing Astrophysics with the Square Kilometre Array (AASKA 14), 174
8Bundy et al. (2015) Bundy, K., Bershady, M. A., Law, D. R., et al. 2015, Ap J, 798, 7