Enabling catalog simulations of transient and variable sources based on   LSST cadence strategies

Rahul Biswas; Scott F. Daniel; R. Hlo\v{z}ek; A. G. Kim; Peter Yoachim; (for the LSST Dark Energy Science Collaboration)

arXiv:1905.02887·astro-ph.IM·April 8, 2020

Enabling catalog simulations of transient and variable sources based on LSST cadence strategies

Rahul Biswas, Scott F. Daniel, R. Hlo\v{z}ek, A. G. Kim, Peter Yoachim, (for the LSST Dark Energy Science Collaboration)

PDF

1 Repo

TL;DR

This paper introduces OpSimSummary, an open-source Python package that facilitates the analysis and reordering of LSST survey simulation data, enabling efficient generation of light curves for astrophysical sources.

Contribution

The paper presents OpSimSummary, a new tool that simplifies access and processing of LSST simulation outputs for light curve analysis, bridging LSST data and existing simulation frameworks.

Findings

01

OpSimSummary enables efficient reordering of LSST simulation data.

02

It provides an API for accessing and summarizing observations.

03

The package supports integration with third-party software like SNANA.

Abstract

The Large Synoptic Survey Telescope (LSST) project will conduct a ten year multi-band survey starting in 2022. Observing strategies for this survey are being actively investigated, and the science capabilities can be best forecasted on the basis of simulated strategies from the LSST Operations Simulator (OpSim). OpSim simulates a stochastic realization of the sequence of LSST pointings over the survey duration, and is based on a model of the observatory (including telescope) and historical data of observational conditions. OpSim outputs contain a record of each simulated pointing of the survey along with a complete characterization of the pointing in terms of observing conditions, and some useful quantities derived from the characteristics of the pointing. Thus, each record can be efficiently used to derive the properties of observations of all astrophysical sources found in that…

Equations35

c_{so u r ce} = κ 1 0^{- 0.4 m}, c_{s k y} = α 1 0^{- 0.4 m_{s k y}}

c_{so u r ce} = κ 1 0^{- 0.4 m}, c_{s k y} = α 1 0^{- 0.4 m_{s k y}}

κ = \frac{25 \times 1 0 ^{0.4 m_{5}}}{2} (1 + (1 + \frac{4 α}{25} 1 0^{- 0.4 m_{s k y}}))

κ = \frac{25 \times 1 0 ^{0.4 m_{5}}}{2} (1 + (1 + \frac{4 α}{25} 1 0^{- 0.4 m_{s k y}}))

κ = \frac{25 α}{κ} 1 0^{0.4 (2 m_{5} - m_{s k y})} (1 + \frac{κ}{α} 1 0^{- 0.4 (m_{5} - m_{s k y})}) .

κ = \frac{25 α}{κ} 1 0^{0.4 (2 m_{5} - m_{s k y})} (1 + \frac{κ}{α} 1 0^{- 0.4 (m_{5} - m_{s k y})}) .

c_{so u r ce} = \frac{A T}{h} \int_{0}^{\infty} F_{ν} (λ) λ^{- 1} d λ S^{t o t} (λ)

c_{so u r ce} = \frac{A T}{h} \int_{0}^{\infty} F_{ν} (λ) λ^{- 1} d λ S^{t o t} (λ)

\frac{c _{s k y}}{n _{e f f}} = \frac{A T}{h} \int_{0}^{\infty} b_{ν} (λ) λ^{- 1} d λ S^{sy s} (λ) p a

\frac{c _{s k y}}{n _{e f f}} = \frac{A T}{h} \int_{0}^{\infty} b_{ν} (λ) λ^{- 1} d λ S^{sy s} (λ) p a

n_{e f f} = 2.27 \frac{F W H M}{p i x e l S c a l e}^{2}

n_{e f f} = 2.27 \frac{F W H M}{p i x e l S c a l e}^{2}

m_{so u r ce}

m_{so u r ce}

c_{so u r ce}

=

T_{b} \equiv \int_{0}^{\infty} S^{t o t} (λ) λ^{- 1} d λ .

T_{b} \equiv \int_{0}^{\infty} S^{t o t} (λ) λ^{- 1} d λ .

m_{s k y}

m_{s k y}

c_{s k y}

=

Σ_{b} \equiv \int_{0}^{\infty} S^{sy s} (λ) λ^{- 1} d λ .

Σ_{b} \equiv \int_{0}^{\infty} S^{sy s} (λ) λ^{- 1} d λ .

c_{so u r ce} = κ 1 0^{- 0.4 m_{so u r ce}} c_{s k y} = α 1 0^{- 0.4 m_{s k y}}

c_{so u r ce} = κ 1 0^{- 0.4 m_{so u r ce}} c_{s k y} = α 1 0^{- 0.4 m_{s k y}}

κ

κ

α

\frac{α}{κ} = \frac{p a \times n _{e f f}}{0.04} (\frac{Σ _{b}}{T _{b}})

\frac{α}{κ} = \frac{p a \times n _{e f f}}{0.04} (\frac{Σ _{b}}{T _{b}})

SNR = \frac{c _{so u r ce}}{( c _{so u r ce} + c _{s k y} ) ^{1/2}}

SNR = \frac{c _{so u r ce}}{( c _{so u r ce} + c _{s k y} ) ^{1/2}}

SNR = \frac{κ 1 0 ^{- 0.4 m_{S N R}}}{( κ 1 0 ^{- 0.4 m_{S N R}} + α 1 0 ^{- 0.4 m_{s k y}} ) ^{1/2}}

SNR = \frac{κ 1 0 ^{- 0.4 m_{S N R}}}{( κ 1 0 ^{- 0.4 m_{S N R}} + α 1 0 ^{- 0.4 m_{s k y}} ) ^{1/2}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LSSTDESC/OpSimSummary
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Enabling Catalog Simulations of Transient and Variable Sources based on LSST Cadence Strategies

Rahul Biswas

The Oskar Klein Centre for CosmoParticle Physics, Department of Physics, Stockholm University, AlbaNova, Stockholm SE-10691

The eScience Institute, University of Washington, Seattle, WA 98195, USA

Department of Astronomy, University of Washington, Seattle, WA 98195, USA

Scott F. Daniel

Department of Astronomy, University of Washington, Seattle, WA 98195, USA

R Hložek

Department of Astronomy and Astrophysics, University of Toronto, ON M5S 3H4, Canada

Dunlap Institute of Astronomy and Astrophysics, University of Toronto, ON M5S 3H4, Canada

A. G. Kim

Physics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720 USA

Peter Yoachim

Department of Astronomy, University of Washington, Seattle, WA 98195, USA

Abstract

The Large Synoptic Survey Telescope (LSST) project will conduct a ten year multi-band survey starting in 2022. Observing strategies for this survey are being actively investigated, and the science capabilities can be best forecasted on the basis of simulated strategies from the LSST Operations Simulator (OpSim). OpSim simulates a stochastic realization of the sequence of LSST pointings over the survey duration, and is based on a model of the observatory (including telescope) and historical data of observational conditions. OpSim outputs contain a record of each simulated pointing of the survey along with a complete characterization of the pointing in terms of observing conditions, and some useful quantities derived from the characteristics of the pointing. Thus, each record can be efficiently used to derive the properties of observations of all astrophysical sources found in that pointing. However, in order to obtain the time series of observations (light curves) of a set of sources, it is often more convenient to compute all observations of an astrophysical source, and iterate over sources. In this document, we describe the open source python package OpSimSummary which allows for a convenient reordering. The objectives of this package are to provide users with an Application Programming Interface (API) for accessing all such observations and summarizing this information in the form intermediate data products usable by third party software such as SNANA, thereby also bridging the gap between official LSST products and pre-existing simulation codes.

††software: Aside from the standard python package, this work used the following software packages: numpy (van der Walt et al., 2011), healpy (Zonca et al., 2019) and HEALPix packages (Górski et al., 2005), pandas (McKinney, 2010), sqlalchemy, scikit-learn (Pedregosa et al., 2011; Buitinck et al., 2013), while the examples use Jupyter Notebooks (Kluyver et al., 2016)

1 Introduction

The Large Synoptic Survey Telescope (LSST) project will conduct a multi-band imaging survey (LSST Science Collaboration, 2009) of the sky with a 3.2 gigapixel camera on a 8 m class ground based telescope at Cerro Pachon, Chile with a field of view of about 10 square degrees. The survey is scheduled to start taking data for science operations in 2022, and cover most of the Southern sky to median single visit depths of $\rm{r}\sim 24.3,$ revisiting each location frequently. The combination of large sky coverage, high depth and repeated visits enables several major scientific goals such as studying the Solar System, astrophysical transients and variables, the Milky Way, and the physics of dark matter and dark energy (Ivezić et al., 2019). The efficacy of such investigations, particularly the Time Domain Astronomy programs involving observations of Time Dependent Astronomical Sources (TDAS) such as transients, variable stars, AGN, as well as solar system objects depends critically on the observing strategy used to determine the sequence of pointings of the telescope.

Forecasting the performance of a science program with LSST survey strategies through the analysis of mock catalogs of observations of sources relevant to the science program is important and timely. Such forecasts are essential for the study of the impact of survey design and strategy. They are also instrumental in developing and testing appropriate analysis methods. Simulation of such mock catalog requires models of the astrophysical sources, models of the observing instrument and analysis methods used to reduce the real data to such catalogs, and a model of the survey strategy along with a model of the observing conditions.

During the survey, the LSST project will make observations of the sky, by pointing in different directions, recording the image for a certain amount of time and then processing the image. This procedure of procuring an image of a sky location for processing is referred to as a ‘visit’ in the LSST literature, and the visit itself may involve two ‘snaps’ separated by the shutter closing (current baseline strategies have two snaps of 15 seconds each resulting in a visit of exposure of 30 seconds). A visit will be followed by a possible slew of the telescope to a different location, after which a new visit starts again to repeat the cycle. As each visit is short, the observing conditions determined by the atmospheric and sky conditions can be approximated as constant during a visit. Currently, the LSST project simulates observations during its survey period using the Operations Simulator (OpSim) (Delgado et al., 2014; Delgado & Reuter, 2016; Reuter et al., 2016). This is done with a prototype scheduler queuing visits according to a strategy designed to optimize science using a high fidelity model of the telescope to calculate times required for telescope slews, and real time observing conditions simulated using an empirical model of the sky and atmosphere. The output of such an OpSim simulation is a sequence of all the visits during the survey, and includes quantities required to describe the state of the telescope after each visit, and the observing conditions during the pointing. Such OpSim outputs may be considered realized forecasts of LSST.

Such forecasts of science performance can be done in several ways representing different trade-offs between computational/storage costs and the level of accuracy. On the low resource end, the Metric Analysis Framework (Jones et al., 2014, MAF) uses ‘metrics’ which are proxies of the scientific performance of the survey. Such proxies are built as functions of quantities related to observational conditions, and are usually designed by scientists on the basis of past experiences and intuition. Such metrics are extremely useful for studying the impact of survey strategy. On the resource intensive end, there are image simulation codes (PhoSim (Peterson et al., 2015) and ImSim 111https://github.com/LSSTDESC/imSim) capable of using the OpSim outputs and producing detailed realistic simulations of LSST images, but are computationally expensive in terms of generation and storage. Further, analysis of these images follows the expected LSST image processing using the LSST software stack (Jurić et al., 2015) and therefore best represents the scientific performance of LSST. However, this analysis is also resource intensive, leading to the conclusion that such end-to-end explorations are hard, and therefore can only be used in a limited number of cases. An interesting middle ground is provided by catalog simulations which utilize the OpSim outputs to obtain the properties of visits, models of the astrophysical sources obtained from previous data or theoretical calculations, and models of aspects of the image processing procedure in the LSST analyses. These simulated catalogs are mock realizations of the information contained in LSST data releases (DRP) containing forced photometry of all time dependent objects detected by the LSST, expected to be released through a (nearly) annual frequency (Jurić et al., 2013), replacing the step of image analysis and reduction to catalog by an assumed model (which can in turn be improved through studies involving reprocessing older data and image simulations).

For the more abundant categories of time dependent sources such as Type Ia Supernovae (SNIa), it is critical for catalog simulations to use distributed computing to speed up the simulations. There are at least two natural paradigms of organizing the distribution of compute resources. The first alternative (a) is to calculate the observed quantities corresponding to each telescope visit at a particular instance of time, which may be further split into smaller spatial regions. Indeed, this is almost essential for any image simulations, and is an approach utilized in generating ‘Instance Catalogs’ by the LSST Catalog Simulations (CatSim) (Connolly et al., 2010, 2014) that are used as intermediate data products by Image Simulation software like PhoSim and ImSim. These Instance Catalogs are catalogs of astrophysical objects in the simulated universe whose light is expected to impinge on the LSST CCDs on that particular visit, along with a complete description of their astrophysical properties at that instance of time. In this method, obtaining the visit information is simple, however the state of the transient objects needs to be persisted from one visit to another, and the output of several visits have to be serialized before the light curves of the transients can be built. In the second approach (b) popular in the transient world, the paradigm involves distributing each astrophysical source (or groups thereof) to different resources, and simulating all of the observations of the source over a sequence of times. While this automatically leads to outputs with light curves for different objects in exactly the format useful for analysis, this calls for collecting the correct sequence of visits at a particular location, which is the only non-trivial step remaining.

Our objective in this work is to provide a solution to the collection of the correct sequence of visits for a transient or variable source to make alternative (b) simple. As described in the rest of the document, we do this by providing an open source package with a simple public Application Programming Interface (API) that users can use to obtain such sequences of visits. We also recognize that there are useful and often used codes like SNANA (Kessler et al., 2009, 2018) which are used to produce catalog simulations of time dependent sources, that demand specific forms of inputs aggregating this information. To enable the use of this code, we also provide a script which produces an intermediate data product (an observation library file in the SNANA terminology) in exactly the input form desired, so that this can work out of the box with SNANA simulations.

2 Methods

While we will not discuss the simulations of time dependent astronomical sources here, we start this section by noting the information about observations necessary for such simulations that are available from OpSim outputs, while a separate code (not provided in this work) is necessary to model the population of astrophysical objects themselves to get simulated observations. In order to simulate catalogs of TDAS, one needs to simulate the observed ’flux’ or photon counts of a source of known apparent brightness, as parameterized by the specific flux $F_{\nu}(\lambda)$ at the top of the earth’s atmosphere, and the uncertainty in the measured flux. The measured flux, or rather the counts of photons received from an astrophysical point source, or the sky are modelled as random variables that follow a Poisson distribution, where the expected counts from the source and the sky can be calculated from the physical parameters of the telescope and instruments, a knowledge of the effective point spread function (PSF), and the specific flux per unit area of the sky. (see Appendix A or Ivezić et al. (2010) for a more comprehensive discussion). The expected counts of photons from astrophysical sources and the sky may be written (please see appendix. A for a derivation, here we only use a summary of the results) in terms of the source magnitude and the sky brightness $m_{sky}$

[TABLE]

where $\kappa,\alpha$ are quantities that can be written in terms of physical constants, physical parameters of the optical system, and noise equivalent area of the effective PSF ( $\mathbf{FWHMeff}$ as listed in OpSim outputs) of the visit, all of which are known or measured quantites. Additionally, $\kappa$ depends on the total transmission function (optical system and atmosphere) through the throughput integral $T_{b}$ which changes from observation to observation, mostly driven by airmass and clouds), while $\alpha$ depends on the system transmission function through the system throughput integral $\Sigma_{b},$ which is constant except for tiny differences caused by flexure of the system, or slowly over the years through the evolution of the system. The signal to noise ratio of the flux measurement is driven by the Poisson error due to both the source and sky counts. Since OpSim outputs do not contain $\kappa$ or $\alpha,$ but an equivalent set of variables, it is convenient to eliminate some of them in terms of quantities that are measured in a survey or available as simulated quantities in the OpSim outputs like the five sigma depth $m_{5},$ the sky brightness $m_{sky},$ and the PSF width provided in OpSim in terms of $\mathbf{FWHMeff}$ . The general expression is

[TABLE]

which reduces to the familiar background dominated limit of $5\sqrt{\alpha}\times 10^{0.2(2m_{5}-m_{sky})}$ in the limit where $\sqrt{c_{sky}}>>1.$ This is similar in spirit in which $\sigma_{rand}$ is calculated in Ivezić et al. (2019). These expressions relate $\kappa$ to physical constants, physical parameters of the optical system that are constant in time through $\alpha,$ and the quantities $m_{sky},m_{5},\mathbf{FWHMeff}$ available in OpSim. It should be remembered that all of these quantities $\alpha,m_{5},m_{sky}$ are not independent, and therefore Eqn. 2 does not imply that changing $\alpha$ by changing the PSF would change $\kappa$ . On the other hand, if the small difference between $T_{b}$ and $\Sigma_{b}$ is ignored so that $\frac{\alpha}{\kappa}$ is considered to be a measured quantity from the measured PSF, one can find an expression for $\kappa$ in terms of the OpSim quantities $m_{sky},m_{5},\mathbf{FWHMeff}$ and the pixel size

[TABLE]

without worrying about the physical characteristics of the optical system.

Thus, our goal is to obtain these terms for each visit in a transient light curve from the OpSim output. This is explained in a step by step procedure in SubSection. 2.2

2.1 Input Data: Operation Simulator Outputs

To summarize the methodology used, we start by describing the input data product, namely the outputs from OpSim. The LSST project simulates observing strategies using the Operations Simulator (OpSim) and the resulting sequence of pointings with properties of observations are disseminated in the form of a sqlite database. The database contains multiple tables, and the most important ones for our purpose are the ‘summaryAllProps’ and ‘proposal’222In version 3 ,the ‘summaryAllProps’ table was called the ‘summary’ table. The ‘proposal’ table is a table of scientific surveys or proposals, each of which have their own requirements in terms of desired visits and survey properties, along with a unique integer identifier ‘proposalId’. Currently, LSST has the Wide Fast Deep survey, a Deep Drilling Field survey, a Southern Galactic Cap Survey, a Milky Way Survey, and a Northern Ecliptic Spur survey in different geographical regions and different survey strategies applied to each of them.

The ‘summaryAllProps’ table is the sequence of simulated observations based on the simulated conditions throughout the ten year period. Each row of the table is an observation or a telescope pointing which we will refer to as ‘visits’. The row for a visit is identified by an integer ‘observationId‘ with important properties characterizing the observation as well as the ‘proposalId‘ whose criteria it satisfies. The characteristics of the observations include the pointing location, the time of observation, the bandpass in which the observation is made, the seeing and the PSF, the sky brightness, and the five sigma depth. The seeing is based on historical data, while the sky brightnesses are computed using a data-driven model (Yoachim et al., 2016). Together, these two tables tell us about all of the simulated observations, and the scientific proposal or survey that they were taken to satisfy. These represent the sum-total of information available about the simulated strategies and are sufficient to generate catalog simulations. Complete details on such quantities are available from the schema of the output in the relevant version 333https://www.lsst.org/scientists/simulations/opsim/summary-table-column-descriptions-v335, 444https://lsst-sims.github.io/sims_ocs/tables/summaryallprops.html. In the current versions, the pointings are located on a discrete grid with an integer (fieldID) identifying each point on the discrete grid. There is no fundamental requirement that an observing strategy uses such a grid, and it is likely (and already true in some alternative simulators) that this grid does not exist; consequently the methodology we will describe below does not use this feature. To give an idea of the sizes involved, a typical operations simulator output contains about 2.5 million visits, while typical OpSim databases have a size of about 4.6 GB. There are some very specific details of OpSim outputs that are not obvious on first encounter. We attempt to list them here

•

Most of the proposals in the current baselines are non-overlapping. If there was a spatial location that was observed by survey WFD, it is not observed by a survey like Southern Celestial Pole or the Northern Ecliptic Spur. However, this is not true for WFD and DDF, and DDF fields can be observed by WFD as well. There is no reason that future mini-surveys will not have such overlapping properties.

•

For a small fraction of cases, there can be multiple (actually two) rows of the summary table which point to the same visit. This happens whenever a particular visit satisfies the requirements of two different proposals or surveys. Currently, this is seen in the overlapping area of the Wide Fast Deep / Deep Drilling Field due to the previous point.

•

While some outputs of the Operations Simulator come with a column of ithereRA and ithereDec , these are added post-facto to the operation simulator output. Discussion of what the dithers should be is still ongoing, but it is useful to have the capability to replace these dithered observations with other dithers obtained from external sources.

2.2 Objectives

To further detail our objectives, we first define some terms that we will use in this paper. For any particular visit in LSST, a sky location within an angular radius of 1.75 degrees (the radius of the LSST focal plane) will be said to be ‘observed by LSST during this visit’. In reality, this is an approximation: LSST chips do not completely fill out the focal plane. There are parts of the circular disk that are not covered by the rectangular geometry of the chips, as well as chip gaps between the chips. Thus, the set of points observed by LSST during a visit according to the above definition is a superset of the points actually observed by the visit. We will ignore this distinction, except to note that the fill factor of chips is about $90\%$ 555https://www.lsst.org/about/camera/features. Given a sequence of visits (or rows of LSST OpSim output) and a sky location, one can find the sequence of visits that will observe the sky location according to this definition. As this quantity will be used repeatedly in this paper, we will for brevity, refer to such a subset of all of the visits in an OpSim output as the ‘visit set’ associated with a point on the sky.

In terms of the terminology defined above, our objectives are quite simple:

Given an OpSim output, and a sky location in terms of Right Ascension (RA) and Declination (Dec), we need a simple API to obtain the visit set of this location, i.e., the sequence of visits in the OpSim output that observe this location. 2. 2.

Since the OpSim outputs are large ( $\sim 2.5\rm{~{}million}$ visits) and the number of transients in LSST simulation volumes can be large ( $\sim\rm{~{}millions}$ ) for abundant and bright transients like SNIa, this could lead to ${\mathcal{O}}(10^{12})$ simple computations if done by brute-force in a naive way. We would like the process to be reasonably fast and not be a huge load on the memory requirements. Note, while the number of cosmologically useful SN in LSST will be smaller than the number of supernovae exploding in the observable volume, simulations have to simulate all of the supernovae before applying selection cuts to identify cosmologically useful supernovae. 3. 3.

Pre-compute this information on a dense grid and serialize to SNANA observation library formats to enable fast computations. 4. 4.

Since the Operations Simulation schema changes from version to version in terms of names, even though the conceptual setup remains the same, we would like to account for these changes and provide a stable interface for a catalog simulator.

3 Results

We present a simple, open source modular python package OpSimSummary based on other open source libraries, particularly the package cikit-learn ~~\citep{scikit-learn} to meet each of our objectives. The code~~\citep{rbiswas4_2019_2671955} is available online~\footnote{\url{https://www.github.com/lsstdesc/OpimSummary, while the particular release described in this paper will be linked at the end. While the actual implementations are somewhat different in terms of packages used, some of the key ideas are inspired by those used in AF . We first explain how this code meets each of our objectives: “subsection–Objective 1: API to collect visits observing a transient˝ “label–ssec:api˝ This package achieves our objective of collecting visits observing a transient. It takes the publicly available –“lsst˝ project provided –“opsim˝ outputs (in –“opsim˝ version 3 and 4, as well as the two other schedulers that were used: the Feature Based Scheduler~“citep–2018arXiv181004815N˝ and AltSched~“citep–2019arXiv190300531R˝) as input, and provides an API for obtaining the visits for a point source at a sequence of arbitrary locations (defined by RA and Dec values). The code structure and examples for doing this are in the appendix of this paper, and available with the source code itself. It also allows for the usage of an additional set of dithers input as the filename of a file in Comma Separated Values (CSV) format. If the sources to be simulated can be simulated independently, distribution is trivial to achieve by splitting their locations into arrays and using these arrays independently. “subsection–Objective 2: Computational Efficiency˝ “label–ssec:comp˙eff˝ While the problem of enumerating all the transients, and the visits that observe each one of them is naively a $–“mathcal–O˝˝(N˙–“rm–visits˝˝) “times –“mathcal–O˝˝(N˙–“rm–transient˝˝),$ it is intuitively clear that an easier computation should be possible. Since one does not require the computation of distances to visit centers that are too far away the computation could take advantage of this. There are different ways of implementing this intuitive idea of locality of visits. For example, a simple approach is choosing a convenient set of sky locations $pl$ at which the visit sets are actually computed and approximating the visit set of an arbitrary point (for example the set of point source locations $tl$ ) by the visit set of a deterministically selected grid point. Thus such schemes are defined by two components: “begin–enumerate˝ “item a selection of points $pl,$ at which the visit sets $v$ will be computed with no approximation. For the approximation to make a computing time difference, it would be nice for the size of $pl$ to be significantly smaller than the size of transients. “item A mapping from the visit sets $v(x)$ for any point $x$ in $tl$ to the visit sets $v(y)$ of points $y$ in $pl$ . “begin–equation˝ v(x) = v(“–v(y)“˝), “qquad x “in tl, y “in pl. “end–equation˝ “end–enumerate˝ A very simple algorithm along these lines would be nearest-neighbor-interpolation, where the component (2) would be defined by assigning to an arbitrary point $x “in tl$ , the visit sets of the point in $pl$ closest to $x$ . Interpolation techniques exploit the smoothness of the function being interpolated. Here the ‘function’ under consideration is a map which returns the visit set of a point. While observing conditions in the sky vary reasonably smoothly with location and time, the set of points being observed by a visit is determined by a hard boundary (edge of the focal plane). Any time such an edge falls between two points, one of the two points will be observed and the other will not. As the distance between two points decreases, the probability of such a visit also decreases, but for a large number of true visits in a visit set (in the WFD survey of –“lsst˝, this is $“sim 1000$ ), this would still be expected to happen. This implies that despite the smoothness of observing conditions with spatial locations, the visit set associated with points would not be ‘interpolated’ as well quantities like sky conditions. For a dense enough set of points, such a strategy could still provide an excellent approximation to the true visit sets. Of course, pre-computation of the quantities in a dense set and their storage could itself be challenging, particularly if several versions of survey strategies are analyzed. An elegant way to exploit the locality of visits without using the smoothness of the visit set is the use of a Tree data structure to partition the data based on spatial positions, so that we should expect a scaling of $–“mathcal–O˝(N˙–“rm–transient˝˝)˝ “times –“mathcal–O˝˝(log(N˙–“rm–visits˝˝)).$ As far as the distance computations are concerned, ie. if we ignore the position of the chips etc., then –“it–this calculation does not involve any additional approximation, and the speed attained is simply due to an organization of the calculation˝.˝ Here, we use a Tree implementation to exploit the locality of visits and provide a simple API to compute the visit set associated with individual visits. This should be easy to use for a simulator in the sense described above. This is done by using an implementation within the package ‘Scikit-learn’~~“citep–scikit-learn˝ called ‘BallTree’~~“citep–sklearn˙api˝. We also use the API to pre-compute visit sets for a particular set of points to obtain approximate visit sets for each point, through an interpolation scheme for the well known –“snana˝ code as described in the next subsection. % “input–simlib˙desc˝ “input–discretization˝

4 Summary and Discussions

In this paper, we discuss the importance of catalog simulations of Time Domain Sources (TDS) for the study of analysis methods, and survey strategy of LSST. Survey strategies of LSST are currently simulated by the LSST project using OpSim; such simulated survey strategies are made public in the form of sqlite databases that are outputs of OpSim. We discuss the transformations of the set of quantities in OpSim that are required for catalog simulations. We also discuss the usefulness of re-ordering the outputs in terms of OpSim visits observing a particular sky location, delineating the necessity of such an API. While conceptually simple, we discuss why a naive solution is inefficient, particularly during the simulation of abundant sources. As strategies to address this issue, we discuss exploiting the locality of visits using a Tree data structure; and approximating the problem by serializing pre-computed results for use with a simulator. This strategy makes the step during simulations essentially instantaneous, but inevitably results in errors which can be minimized by choosing a very dense set of pre-determined points at the cost of large file sizes. We present an open source modular python source software package for such operations, which contains an API for reading in OpSim outputs and re-ordering them to obtain the visits for each point. Thus, a simulation code can directly use this API to obtain the important quantities. A Tree is used to speed up the calculations. We also use the obtained visits, along with simple transformations of OpSim quantities to serialize the results for a set of points in the form of an SNANA simlib. The script to perform this is also made available as part of the OpSimSummary package. Currently OpSimSummary works with OpSim outputs of version 3, and 4, along with outputs of Feature Based Scheduler and AltSched. We study the accuracy of the approximate pre-computed visit sets as a function of the density (or average separation) of the points at which the visit sets are actually computed, and show that at large average separations between these points, the visit set of sky locations have several visits missing, while several new visits not originally in the visit set are inserted. According to the numbers calculated for the current strategies, we would expect the the current method to include $\sim 10\%$ visits are missing while a similair number of $\sim 10\%$ visits that were not in the true visit set were added. This code has been used through the direct use of API in the study of serendipitous discoveries of Kilonovae using the LSST (Setzer et al., 2018) which also formed part of a LSST DESC survey strategy white paper for Wide Fast Deep Fields in LSST (Lochner et al., 2018). SNANA observation library files (Biswas et al., 2017) generated through previous versions of OpSimSummary (and distributed publicly with the SNANA code) have been used in the study of serendipitous detection of Kilonovae (Scolnic et al., 2018a) and the LSST DESC Science Requirement Document (The LSST Dark Energy Science Collaboration et al., 2018). This paper describes the improved versions of SNANA observation library files (simlibs) currently available, developed primarily for the data generation of PLAsTiCC (The PLAsTiCC team et al., 2018), as described in the PLAsTiCC models and simulations paper (Kessler et al., 2019). These observation library files have also been used in the supernova simulations using SNANA used for the supernova cosmology analyses in the LSST DESC Survey Strategy white papers (Lochner et al., 2018; Scolnic et al., 2018b).

Acknowledgments

RB would like to thank David Cinabro for sharing his DACG project at the beginning of this work, Rick Kessler for stimulating discussions, and particularly on its use with respect to SNANA and Lynne Jones for help in understanding MAF. This paper has undergone internal review in the LSST Dark Energy Science Collaboration. The internal reviewers were Philippe Gris and Isobel Hook, and the authors would like to thank them for their comments. Author contributions are listed as follows. RB: Initiated and led project, wrote the oss package, drafted paper, derived results. SFD: Provided support for middleware connecting OpSim with simulations of astrophysical sources. RH: Code beta testing, discussed results. AGK: Motivated the project and consulted with RB particularly in its initial phases. PY: Supported OpSim and MAF usage.

During this work, RB was partially supported by the Washington Research Foundation Fund for Innovation in Data-Intensive Discovery and the Moore/Sloan Data Science Environments Project at the University of Washington and the Swedish Research Council (VR) through the Oskar Klein Centre. RB was further supported by the research environment grant “Gravitational Radiation and Electromagnetic Astrophysical Transients (GREAT)” funded by the Swedish Research council (VR) under Dnr 2016-06012. The DESC acknowledges ongoing support from the Institut National de Physique Nucléaire et de Physique des Particules in France; the Science & Technology Facilities Council in the United Kingdom; and the Department of Energy, the National Science Foundation, and the LSST Corporation in the United States. DESC uses resources of the IN2P3 Computing Center (CC-IN2P3–Lyon/Villeurbanne - France) funded by the Centre National de la Recherche Scientifique; the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231; STFC DiRAC HPC Facilities, funded by UK BIS National E-infrastructure capital grants; and the UK particle physics grid, supported by the GridPP Collaboration. This work was performed in part under DOE Contract DE-AC02-76SF00515.

Appendix A Point Sources and SNR

Given the physical parameters describing a telescope, and a description of the sky and astrophysical sources, one can calculate quantities like the expected number of photons collected from a point source in the sky (ie. no background galaxy), or the sky. Combining this with observing conditions based on seeing, airmass etc., one can calculate a good estimate of the expected signal to noise ratio of an observation. We follow the discussion in Ivezić et al. (2010), keeping the gain $g=1$ in our calculation (For an extensive discussion including latest updates to LSST values, see Jones (2016)). For a point source with intensity $\epsilon(\lambda)$ as a function of its wavelength $\lambda,$ the number of photons collected with an exposure time $T$ in a telescope with collecting area $A$ is given by:

[TABLE]

where the flux density $F_{\nu}(\lambda)$ is the frequency derivative of the intensity $F_{\nu}(\lambda)\equiv\frac{d\epsilon(\lambda)}{d\nu},$ while $S^{tot}(\lambda)$ is the total transmission probability due to the atmosphere, the telescope system, and $h$ is the Planck constant. We note that $S^{tot}(\lambda)$ is also a function of time through the dependence of the atmospheric transmission functions on airmass, and atmospheric conditions. Similarly, using the intensity per unit area of the sky $b_{\nu}(\lambda),$ one can calculate the time averaged number of photons collected in $n_{eff}$ pixels as

[TABLE]

where $pa$ is the area of a pixel. In order to estimate the number of photons collected from the source and sky during a particular exposure from he observed pixel counts, one uses estimators such as ‘aperture photometry’ and ‘psf photometry’. In each of these, one can use a value of $n_{eff}$ pixels based on the observing conditions. For the estimator used in PSF photometry, this is given by

[TABLE]

if the PSF profile is assumed to be a single radial Gaussian. These counts obviously depend on the flux densities in exactly the same way as magnitudes in the bands, and so be calculated just by knowing the source magnitudes and the sky brightness in $mags/{arcsec}^{2},$ without requiring complete information on the flux densities.

[TABLE]

where the numerical values in the last line assumes that the magnitude is in the AB system (ie. $F^{std}_{\nu}=3631Jy$ ), and that the area is a circular disk of diameter $D,$ and the throughput integral $T_{b}$ is

[TABLE]

One can do a similar calculation for the counts of sky photons:

[TABLE]

where the system throughput integral $\Sigma_{b}$ is

[TABLE]

Hence, we see that we can write the photons counts as

[TABLE]

where we can write $\kappa,\alpha$ in terms of physical quantities emphasizing the fact that $T_{b}(t)$ changes with time, as does $n_{eff},$ but is related directly to quantities that are supplied by most surveys (and in OpSim).

[TABLE]

So that we get

[TABLE]

The Signal to Noise Ratio (SNR) of a measured source can be found from Poisson statistics:

[TABLE]

In practice, there may be other small sources of uncertainty such as read noise or other systematic errors that could in principle be grouped together with the Poisson Noise in the denominator of Eqn. A16. Plugging Eqn. A12 into Eqn. A16, we can get

[TABLE]

If values of $m_{sky}$ and $m_{5}$ are supplied for an ovservation for a survey (as they often are), one can solve this to obtain Eqn. 2

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Biswas et al. (2017) Biswas, R., Cinabro, D., & Kessler, R. 2017, doi:10.5281/zenodo.1006719
2Biswas et al. (2019) Biswas, R., Setzer, C., & Azfar, F. 2019, LSSTDESC/Op Sim Summary: 2.0.0, doi:10.5281/zenodo.2671955
3Buitinck et al. (2013) Buitinck, L., Louppe, G., Blondel, M., et al. 2013, in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 108–122
4Connolly et al. (2010) Connolly, A. J., Peterson, J., Jernigan, J. G., et al. 2010, in Proc. SPIE, Vol. 7738, Modeling, Systems Engineering, and Project Management for Astronomy IV, 77381 O
5Connolly et al. (2014) Connolly, A. J., Angeli, G. Z., Chandrasekharan, S., et al. 2014, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 9150, Modeling, Systems Engineering, and Project Management for Astronomy VI, ed. G. Z. Angeli & P. Dierickx, 14
6Delgado & Reuter (2016) Delgado, F., & Reuter, M. A. 2016, in Proc. SPIE, Vol. 9910, Observatory Operations: Strategies, Processes, and Systems VI, 991013
7Delgado et al. (2014) Delgado, F., Saha, A., Chandrasekharan, S., et al. 2014, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 9150, Modeling, Systems Engineering, and Project Management for Astronomy VI, ed. G. Z. Angeli & P. Dierickx, 15
8Górski et al. (2005) Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, Ap J, 622, 759