Towards a Self-calibrating, Empirical, Light-Weight Model for Tellurics   in High-Resolution Spectra

Christopher Leet; Debra A. Fischer; Jeff A. Valenti

arXiv:1903.08350·astro-ph.IM·May 1, 2019

Towards a Self-calibrating, Empirical, Light-Weight Model for Tellurics in High-Resolution Spectra

Christopher Leet, Debra A. Fischer, Jeff A. Valenti

PDF

TL;DR

This paper introduces SELENITE, a lightweight, self-calibrating empirical model for telluric correction in high-resolution spectra, eliminating the need for standard star observations or imprecise line data, thus enabling precise radial velocity measurements.

Contribution

The paper presents SELENITE, a novel empirical model that accurately corrects telluric contamination without requiring line data or additional observations, streamlining high-precision spectroscopic analysis.

Findings

01

Achieves residuals of about 1% of continuum in spectra.

02

Fits data with a reduced chi squared of 1.17 on B stars.

03

Fitting process takes seconds on standard laptops.

Abstract

To discover Earth analogs around other stars, next generation spectrographs must measure radial velocity (RV) with 10 cm/s precision. To achieve 10cm/s precision, however, the effects of telluric contamination must be accounted for. The standard approaches to telluric removal are: (a) observing a standard star and (b) using a radiative transfer code. Observing standard stars, however, takes valuable observing time away from science targets. Radiative transfer codes, meanwhile, rely on imprecise line data in the HITRAN database (typical line position uncertainties range from a few to several hundred m/s) and require difficult-to-obtain measurements of water vapor column density for best performance. To address these issues, we present SELENITE: a SELf-calibrating, Empricial, Light-Weight liNear regressIon TElluric model for high-resolution spectra. The model exploits two simple…

Figures11

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: An excerpt from the telluric database generated for our training spectra.

$λ_{i}$	$σ_{λ} / σ_{5901.6 Å}$	PCC	Species Flag
5898.12061	0.49523	0.992	W
5898.14209	0.89206	0.994	W
5898.18457	0.66062	0.991	W
5898.20556	0.34039	0.937	W
5898.99121	0.47828	0.977	W

Equations11

I_{λ} = I_{λ, 0} e^{- σ_{λ} \cdot n_{j} \cdot z}

I_{λ} = I_{λ, 0} e^{- σ_{λ} \cdot n_{j} \cdot z}

ln I_{λ} = - σ_{λ} \cdot n_{j} \cdot z

\frac{ln I _{λ_{i, t 2}} - ln I _{λ_{i, t 1}}}{ln I _{λ_{c a l, t 2}} - ln I _{λ_{c a l, t 1}}} = \frac{σ _{λ_{i}} [ n _{t 2} \cdot z _{t 2} - n _{t 1} \cdot z _{t 1} ]}{σ _{λ_{c a l}} [ n _{t 2} \cdot z _{t 2} - n _{t 1} \cdot z _{t 1} ]} = \frac{σ _{λ_{i}}}{σ _{λ_{c a l}}} \equiv m_{cal}^{λ_{i}}

\frac{ln I _{λ_{i, t 2}} - ln I _{λ_{i, t 1}}}{ln I _{λ_{c a l, t 2}} - ln I _{λ_{c a l, t 1}}} = \frac{σ _{λ_{i}} [ n _{t 2} \cdot z _{t 2} - n _{t 1} \cdot z _{t 1} ]}{σ _{λ_{c a l}} [ n _{t 2} \cdot z _{t 2} - n _{t 1} \cdot z _{t 1} ]} = \frac{σ _{λ_{i}}}{σ _{λ_{c a l}}} \equiv m_{cal}^{λ_{i}}

ln I_{λ_{i}} = m_{cal}^{λ_{i}} ln I_{λ_{c a l}}

ln I_{λ_{i}} = m_{cal}^{λ_{i}} ln I_{λ_{c a l}}

ln I_{λ_{i}} = {m_{cal}^{λ_{i}} \cdot ln I_{cal} 0 w h e n λ_{i} \in v a l i d p e ak \land P C C_{λ} > k o t h er w i se

ln I_{λ_{i}} = {m_{cal}^{λ_{i}} \cdot ln I_{cal} 0 w h e n λ_{i} \in v a l i d p e ak \land P C C_{λ} > k o t h er w i se

ln I_{λ_{i}} = \frac{m _{A}^{λ_{i}}}{m _{A}^{B}} \cdot ln I_{B}

ln I_{λ_{i}} = \frac{m _{A}^{λ_{i}}}{m _{A}^{B}} \cdot ln I_{B}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Towards a Self-calibrating, Empirical, Light-Weight Model for Tellurics in High-Resolution Spectra

Christopher Leet

Yale University, 52 Hillhouse, New Haven, CT 06511, USA

Debra A. Fischer

Yale University, 52 Hillhouse, New Haven, CT 06511, USA

Jeff A. Valenti

Space Telescope Science Institute, 3700 San Martin Dr., Baltimore, MD 21218 USA Christopher Leet [email protected]

Abstract

To discover Earth analogs around other stars, next generation spectrographs must measure radial velocity (RV) with 10 cm/s precision. To achieve 10cm/s precision, however, the effects of telluric contamination must be accounted for. The standard approaches to telluric removal are: (a) observing a standard star and (b) using a radiative transfer code. Observing standard stars, however, takes valuable observing time away from science targets. Radiative transfer codes, meanwhile, rely on imprecise line data in the HITRAN database (typical line position uncertainties range from a few to several hundred m/s) and require difficult-to-obtain measurements of water vapor column density for best performance. To address these issues, we present SELENITE: a SELf-calibrating, Empricial, Light-Weight liNear regressIon TElluric model for high-resolution spectra. The model exploits two simple observations: (a) water tellurics grow proportionally to precipitable water vapor and therefore proportionally to each other and (b) non-water tellurics grow proportionally to airmass. Water tellurics can be identified by looking for pixels whose growth correlates with a known calibration water telluric and modelled by regression against it, and likewise non-water tellurics with airmass. The model doesn’t require line data, water vapor measurements and additional observations (beyond one-time calibration observations), achieves fits with a $\chi^{2}_{red}$ of 1.17 on B stars and 2.95 on K dwarfs, and leaves residuals of $1\%$ (B stars) and $1.1\%$ (K dwarfs) of continuum. Fitting takes seconds on laptop PCs: SELENITE is light-weight enough to guide observing runs.

techniques: radial velocities – methods: data analysis

††journal: ApJ††facilities: CTIO: (CHIRON)

1 Introduction

To expand the success of exoplanet searches, next generation spectrographs are aiming for sub-meter-per-second precision in radial velocity measurements. If the 10 cm s*-1* instrumental precision goal of the Echelle SPectrograph for Rocky Exoplanets Search and Stable Spectroscopic Observations (ESPRESSO Pepe et al., 2013) and the EXtreme PREcision Spectrograph (EXPRES Jurgenson et al., 2016) is reached, we will be able to detect small rocky planets orbiting in the habitable zones of their host stars. Such high precision requires extraordinary new fidelity in spectroscopic data: high resolution, high $\rm S/N$ and greater instrumental stability. In addition to controlling instrumental errors, success requires accounting for any systematic temporal changes in the spectral line profiles, which can arise from photospheric velocities or telluric contamination (Fischer et al., 2016).

Most work on modeling telluric contamination has been tested at near infrared wavelengths where the telluric line depths are comparable to stellar absorption lines. However, the next generation optical spectrographs aiming for 10 cm s*-1* radial velocity precision will be affected by time variable microtellurics that raster across the stellar spectrum because of barycentric velocity shifts. If we do not identify pixels that are producing the small perturbations to spectral line profiles, then microtellurics may dominate the error budget for extreme precision radial velocity programs.

2 Telluric Spectra

Atomic and molecular species in the Earth’s atmosphere interact with solar radiation and produce absorption and emission lines that are imprinted in stellar spectra obtained with ground-based spectrographs. The non-water constituents (e.g., $\rm N_{2}$ , $\rm O_{2}$ , Ar, Ne, He) are well-mixed, and maintain a nearly fixed element ratio throughout the troposphere, stratosphere, and mesosphere. The concentration of some non-water species ( $\rm CO_{2}$ , $\rm CH_{4}$ , $\rm NO_{X}$ ) exhibit seasonal changes or modulation from post-industrial human activities. However, these gases have stable concentrations on timescales of (at least) several days. In contrast, 99% of atmospheric water vapor is confined to the troposphere and exhibits both temporal and spatial variability that can change by more than 10% on timescale of an hour (Blake & Shaw, 2011).

Figure 1 shows the telluric spectrum with a wavelength range of $4500-6800\textup{\AA}{}$ obtained with the Fourier Transform Spectrograph (FTS) from the National Optical Astronomical Observatory (NOAO Wallace et al., 1993). The strongest telluric lines are found redward of about 6800Å and present a particular challenge for radial velocity measurements in the near infrared. However, he high $\rm S/N$ and resolution of the FTS telluric spectrum shows that the optical spectrum is peppered with microtelluric lines with depths that are only a few percent of the continuum. Many of the lines shallower than 1% in Figure 1 will disappear when convolved with the instrumental line spread function (LSF) of high-resolution ( $R~{}100,000$ ) echelle spectrographs. The surviving microtelluric lines are, however, very difficult to discern when superimposed on stellar spectra. Even for stars with constant radial velocity, the barycentric velocity of the Earth causes the telluric lines to raster across stellar absorption lines with annual amplitudes up to 30 km s*-1*, producing small, but systematic, time-variable line profile variations. Optical RV programs aiming for 10 - 20 cm s*-1* precision will need to account for microtelluric lines because they introduce errors that exceed the target RV precision Cunha et al. (2014).

3 Current Best Practices

Since telluric contamination is a serious error source for high precision spectroscopy, there is a rich literature of practices for telluric modelling. These practices fall into three categories: (a) modelling using telluric standard stars (Section 3.1), (b) modelling using radiative transfer codes (Section 3.2) and (c) modelling using principle component analysis (Section 3.3). Finally, we discuss the literature surrounding a new challenge in telluric modelling: microtelluric modelling (Section 3.4).

3.1 Telluric Standard Stars

The classical approach to removing telluric absorption features is to observe a telluric standard star close in time and airmass to the science object (Vacca et al., 2003; Vidal-Madjar et al., 1986). The science target’s spectrum is then divided by the spectrum of the standard star. Typically, early type stars from early A to late B are chosen as standard stars because they exhibit few and weak metal lines, and their rapid rotation helps smear out the lines that remain. The high $\rm S/N$ afforded by bright stars means that with high spectral resolution, even shallow telluric lines are discernible. These stars have the drawback that their strong hydrogen absorption features at the Brackett and Paschen lines blend with their tellurics (Rudolf, N. et al., 2016). As an alternative, a solar type star can be used as a telluric standard using a high-resolution solar spectra (Maiolino et al., 1996).

Using any standard star as a telluric reference model has several well known drawbacks. First, it takes away precious observing time from an observation’s science targets, especially when high $\rm S/N$ requirements are to be met (Seifahrt, A. et al., 2010). Second, its accuracy is limited by how well the standard star’s spectrum is known. Early type stars often display spectral features such as oxygen or carbon lines in the near-infrared. Similarly, absorption line depths of solar-type stars may deviate from the solar FTS atlas due to metal abundance or surface temperature deviations, leaving residuals from the star’s intrinsic features in the telluric model (Rudolf, N. et al., 2016). Compounding this problem, the need to pick a star close to the science target often forces the observation of less well known stars. Finally, for telescopes with an adapative optics system (e.g. CRIRES), the change in source brightness between the science target and the standard star will affect the instrumental profile (Seifahrt, A. et al., 2010). In practice, Ulmer-Moll, S. et al. (2019) find that standard stars consistently underperform other telluric removal approaches.

3.2 Radiative Transfer Codes

Another approach is to use line-by-line radiative transfer model (LBLRTM) codes to model telluric lines. This technique requires accurate atmospheric temperature and pressure profiles, an excellent model for the spectrograph line spread function, and a complete and accurate atomic line data base. The atmospheric inputs to these codes have benefited from the commercial interest and investments in making more accurate weather predictions. Most radiative transfer codes, including TERRASPEC (Bender et al., 2012), Transmissions Atmosphériques Personnalisées Pour l’AStronomie (TAPAS Bertaux et al., 2014), Telfit (Gullikson et al., 2014) and Molecfit (Smette et al., 2015) use the HIgh Resolution TRANsmission line database (HITRAN Rothman et al., 2013) and are able to model non-water telluric lines with an accuracy of around 2%.

Unfortunately, radiative transfer codes also suffer from documented drawbacks. First, radiative transfer codes are limited by imprecise line data in the HITRAN database. The uncertainty in each HITRAN line’s position is typically a few to several hundred m/s, but can be up to multiple km/s. HITRAN line strengths are rarely accurate to the 1% level (Seifahrt, A. et al., 2010). Rudolf, N. et al. (2016) also remark on this problem when modelling tellurics in the near IR.

Second, radiative transfer codes often struggle to model water lines. Bertaux et al. (2014) identify some cases in TAPAS where two adjacent water lines required different amounts of water for an adequate model. This is clearly non-physical (there is only one column density of water), but the authors are uncertain why this discrepancy appears. Rudolf, N. et al. (2016) note that HITRAN has imperfect water line information and induce substantial residuals in their radiative transfer code.

3.3 Principal Component Analysis

Artigau et al. (2014) investigated the use of principal component analysis (PCA) for empirically modeling telluric lines at near infrared wavelengths. They used observations of hot, rapidly rotating stars to build a library of telluric standards with a range of water column density and air mass. The first five principal components of the telluric absorption features were used to fit telluric lines in spectra of program stars using least squares fitting. This empirical approach self-calibrates spectra and avoids the need for atomic line data or estimates of water column density. We believe that PCA’s empirical approach is on the right track. However, PCA is a very generic model, and could benefit by incorporating the well-studied physics of telluric line formation. By introducing principled physical priors, we aim to improve the sophistication of this approach.

3.4 The Challenge of Microtellurics

Most methods for modeling telluric lines have been applied to lines that are redward of 6800Å. The telluric features at these red wavelengths are easier to identify, both because the telluric lines are deeper and the density of stellar lines is decreasing. Currently there is not a robust method for modeling microtellurics. Unfortunately, simulations by Cunha et al. (2014) show that if ignored, microtelluric contamination in the optical spectrum will introduce RV errors between 0.2 - 1.0 m s*-1*, swamping the error budget of next generation RV surveys. Cunha et al. (2014) modeled microtelluric lines in HARPS optical spectra using TAPAS, an online service that simulates atmospheric transmission with input from the ETHER Atmospheric Chemistry Data Centre, atomic line data from HITRAN, and an LBLRTM code (Bertaux et al., 2014). The atmospheric temperature and pressure model for the geographic region near La Silla is updated every six hours, and the model with the closest match in time to observations is adopted with small empirical adjustments to water vapor column density. Based on simulations with synthetic spectra, Cunha et al. (2014) expected that the improvement in RV precision for most stars would be in the range of 10 - 20 cm s*-1*. Achieving RV accuracies of 10 cm s*-1* necessitates accurate modelling of microtellurics.

4 SELENITE: A Self-Calibrating Linear Regression Model

We now describe SELENITE’s telluric model. Since water and non-water tellurics exhibit different behavior (Hadrava, P., 2006), SELENITE treats their lines separately, and so we develop the model as follows. First, we describe the training data used to illustrate and evaluate SELENITE (Section 4.1). We proceed to describe the model for water tellurics (Section 4.2) and evaluate its performance on the B star HR3982 (Section 4.3). We then describe the model for non-water tellurics (Section 4.4) and evaluate its performance (Section 4.5), before finally combining the two halves and applying them to Alpha Centaur B, a K dwarf with significant stellar features (Section 4.6).

4.1 Training Data

The training data included 51 spectra of rapidly rotating B stars observed with the fiber-fed CHIRON spectrograph (Tokovinin et al., 2013), which is located at 1.5-m telescope at the Cerro Tololo Interamerican Observatory (CTIO). The B-type stars are ideal for this calibration because they are bright and have few spectral lines, providing high $\rm S/N$ spectra that are relatively easy to continuum normalize. The iodine cell that is used for Doppler measurements with CHIRON was not in the light path for any of these observations. These spectra were obtained with the narrow slit mask, which yields a spectral resolution, $\lambda/\delta\lambda$ of ${\rm R=140,000}$ and exposure times were set to reach a typical $\rm S/N$ of 100. The air mass for each observation was recorded in the FITS header; however, no information was available regarding the PWV or other atmospheric conditions.

Figueira et al. (2010) demonstrate long-term stability of telluric lines at the level of 10 m s*-1* (corresponding to 0.01 of a pixel) at the La Silla Observatory using the environmentally stabilized and fiber-fed HARPS spectrograph. The CHIRON spectrograph does not have the stability of HARPS, and the spectral format can drift by a fraction of a pixel from night-to-night. To correct for these small drifts, the spectral orders were cross-correlated to align the telluric absorption lines.

4.2 Water Tellurics

4.2.1 The Theory of Water Tellurics

Each water vapor line has a specific absorption coefficient, $\sigma$ , which depends on fundamental atomic and molecular line data, including the $\log(gf)$ value, excitation potential, and the partition function. The line strength of water tellurics also changes with the number of absorbers along the line of sight, or the column density. The radiative transfer equation for the intensity of light with wavelength $\lambda$ passing through a plane-parallel atmosphere with a single species of absorber is:

[TABLE]

where $I_{\lambda,0}$ and $I_{\lambda}$ are the initial and final intensity, $\sigma_{\lambda}$ is the effective cross-section for absorption, and $n$ is the average number density of water vapor absorbers. Path length, $z$ , is measured in units of airmass at zenith. The column density of water vapor, PWV is $n_{j}\cdot z$ . If a spectrum is normalized, ( $I_{\lambda,0}=1.0$ ), the natural logarithm of its line intensity is proportional to the average absorption cross-section and the number of absorbers. While each water line will have a unique absorption cross-section, all water lines in an observation will share the same PWV ( $n\cdot z$ ).The depth of any two water lines is therefore linearly related: by measuring the depth of an arbitrary water line (or set of lines), we can predict the depth of every other water line in the spectrum. We refer to the water telluric used to construct the telluric spectrum as the calibration telluric, and the pixel at the core of the calibration line as the calibration pixel.

As an example, Figure 2 shows two water telluric lines from the set of training spectra. Both sets of spectra (Figure 2 right, top and bottom) have been color-coded by the intensity of the pixel at $\lambda=5898.16$ Å, emphasizing the correlated line growth. In the left panel of Figure 2, the correlation between the logarithm of the pixel intensity for these two water telluric features is shown to be linear, with a Pearson Correlation Coefficient (PCC) of 0.99, and the fitted regression line has residuals of 0.0085, comparable to the average deviation of the continuum from unity (0.01).

We now derive the precise relationship between the depths of any two water tellurics. From the radiative transfer equation, the intensities of any pair of water lines, $(I_{\lambda_{i}},I_{\lambda_{cal}})$ , grow proportionally to each other in log space. Since the average number density of water absorbers and the airmass is a constant at any time $t_{i}$ , the constant of proportionality between the growth of two lines, as shown in Equation 3, can be physically interpreted as the ratio between the absorption cross-section at two wavelengths: $\sigma_{\lambda_{i}}/\sigma_{\lambda_{cal}}$ . We denote this constant of proportionality as $m^{\lambda_{i}}_{\text{cal}}$ .

[TABLE]

A similar linear regression is carried out to empirically relate every other pixel in the spectrum to the calibration pixel, implying an equation of the form $\ln{I_{\lambda_{i}}}=m^{\lambda_{i}}_{\lambda_{cal}}\ln{I_{\lambda_{cal}}}+b$ . During this process, the y-intercept was always found to be zero, simplifying the regression model to:

[TABLE]

One exception to the above are saturated tellurics, which have left the linear regime of growth and do not obey Equation 1. In both our water and non-water analysis, however, we find no telluric deeper than 50% of continuum between 4500Å-6800Å and so no saturated telluric. Saturated tellurics are therefore considered outside of this paper’s scope. Another exception to the above is variations in the instrumental line spread function (LSF) over time changing a telluric’s profile. SELENITE does not model instrumental errors, and these variations can only be handled by observing new training data under the new LSF. Fortunately, at CHIRON’s resolution tellurics are marginally resolved, attenuating LSF changes. In practice, CHIRON’s LSF is relatively stable over years, allowing 2012 K-dwarf observations to be fit by a model built on 2014 B star observations (Section 4.6).

The correlated growth of water tellurics can also be exploited to identify water tellurics. The PCC of each pixel’s growth with the calibration pixel can be measured, and each pixel whose PCC exceeds a threshold, $k$ , can be flagged as containing a water telluric. Usefully, SELENITE can discover new water tellurics not contained in HITRAN and correct the position of HITRAN’s water tellurics.

Three additional tests are applied to pixels with PCC $>k$ to eliminate false positives: First, the line spread function for CHIRON has a full width half maximum of 3 pixels. Therefore, we require a minimum of three consecutive pixels with PCC values that exceed $k$ . Single or double pixels are assumed to be spurious. Second, because telluric lines have Gaussian profiles, the cluster of flagged pixels must pass a peak detection algorithm. Finally, the high resolution FTS solar spectrum (Figure 1) indicates that telluric lines appear in clusters rather than as single isolated lines. Any isolated telluric without another telluric within 10Å is therefore rejected.

4.2.2 Establishing a PCC Threshold

The threshold PCC ( $k$ ) for flagging pixels with a telluric signal must be chosen to minimize both the number of both spurious detections (false positives) and the number of missed telluric lines (false negatives). This critical step ensures that the model telluric spectrum will have the highest possible fidelity. If spurious features are included in a model, they will be used to assign zero weight pixels, resulting in lost data for the radial velocity cross-correlation. If telluric features are missed in a model, they will remain in the stellar spectrum and increase the radial velocity errors.

The selection process begins by profiling the false positive rates of different values of $k$ . The correlation between a calibration pixel and a noise pixel in the data set is simulated by generating $n=51$ points of the form [ $\ln(I_{cal})$ , $\ln(I_{\lambda})$ ]. The values of $\ln(I_{cal})$ evenly fill the range $[-1,0]$ and represent a range of possible calibration line depths, while values of $\ln(I_{\lambda})$ are drawn at random from a Gaussian distribution with $\sigma=0.01$ , representing shot noise typical of the CHIRON spectra ( $\rm S/N$$\sim 100$ ). The PCC for each set is recorded, and the process repeated for 100,000 trials. The results are summarized in Figure 3. For the level of simulated noise, roughly 1% of pixels yield a PCC of $0.323$ ; 0.1% of pixels have a PCC above $0.425$ and fewer than $0.01$ % of pixels generate a PCC $>0.506$ . Since single and double pixel clusters with PCC above the threshold are rejected, assuming that each pixel’s noise is independent, a threshold of $k=0.425$ has just a $0.1\%^{3}=10^{-7}\%$ change of generating a false positive. Since the CHIRON spectrum has about 200,000 pixels, this threshold has just a 0.02% chance of generating a false positive.

Once a threshold PCC is established, the minimum line depth detectable under the threshold in spectra with $\rm S/N$ $\sim 100$ is evaluated. A PCC threshold that is too high will fail to detect shallow lines (generating false negatives), reducing the sensitivity of the model. We again generated points representing pixels from 51 spectra with the form [ $\ln(I_{cal}),\ln(I_{\lambda})$ ]. The calibration line depth, $\ln(I_{cal})$ , was again evenly distributed across the range $[-1,0]$ , while the pixels representing $\ln(I_{\lambda})$ were scaled according to $\ln(I_{\lambda})=c\cdot\ln(I_{cal})$ . By randomly selecting values of $c\in[0,0.07]$ , these points represent telluric line depths $\leq 7$ %. Gaussian noise consistent with $\rm S/N$ $\sim 100$ was then added to $\ln({I_{\lambda}})$ , and the percentage of time that the PCC was greater than $k$ for pixel pairs was recorded. This simulation was repeated for 100,000 trials, and the results show that 90% of lines deeper than 2.3% and 99.9% of lines 3% of the continuum will be identified with the linear regression method described here (Fig 3, right). However, there is a precipitous drop in our ability to model tellurics with line depths shallower than 2%. This result is, of course, dependent on the $\rm S/N$ of the training population and should improve if the training set had higher $\rm S/N$ and better continuum normalization.

4.2.3 SELENITE’s Water Telluric Model

The steps taken to identify and model water tellurics in Section 4.2.1 are summarized below.

The PCC of each pixel’s growth with a calibration pixel is calculated. A threshold PCC, $k$ , is established, and pixels with PCC $>k$ are flagged as significant. 2. 2.

Single or double pixels with PCC $>k$ are rejected as spurious. 3. 3.

The training data set is coadded and a peak detection algorithm is applied to each cluster of more than three pixels. Clusters which do not contain a peak are rejected as tellurics. 4. 4.

Any cluster of flagged pixels with no other cluster with 10Å is rejected as a telluric feature. 5. 5.

Linear regression is carried out on pixels that are flagged as tellurics to measure $m^{\lambda}_{\text{cal}}$ relative to a pre-identified calibration pixel. The wavelength, regression coefficient, PCC and water/non-water classification of each flagged pixel is then stored in a database.

The wavelength, linear coefficient, PCC, and a flag identifying the pixel as water is stored for each pixel that has passed the selection criteria for water tellurics is stored in a database. Table 1 lists an excerpt of a database generated from the training data’s content using the 5901.6Å telluric as a calibrator. To generate a model of telluric water lines, the intensity of the central pixel in a calibration line is measured and information in the database is used to generate water tellurics for every pixel in the spectrum:

[TABLE]

where $m^{\lambda_{i}}_{\text{cal}}$ is the ratio of effective cross-section for absorption at $\lambda_{i}$ relative to $\lambda_{\text{cal}}$ is the effective cross-section of the calibration line wavelength (or the weighted average for an ensemble of calibration lines), $I_{\text{cal}}$ is the intensity at the calibration line wavelength, and $k$ is the threshold correlation coefficient indicating telluric presence. Generation of the telluric water model takes less than 3 minutes on a 2015 Macbook Air with a 2.2 GHz Intel Core i7 processor and 8GB of 1600 MHz DDR3 RAM and allows for identification of variable numbers of telluric-contaminated pixels, depending on the PWV.

This is valuable since, as Figure 2 shows, water telluric size can vary by an order of magnitude. On night with high PWV, at a threshold $k$ of 0.425 (see Section 4.2.2), up to $\sim 4150$ pixels in our training spectra were contaminated, $3.1\%$ of pixels under 6800Å. On dry nights, as few as $\sim 1700$ pixels were contaminated, $1.2\%$ of pixels under 6800Å. This is a savings of $\sim 75\%$ of an order.

4.2.4 Identifying and Modelling Water Microtellurics

SELENITE is successful at identifying relatively shallow telluric features. Figure 4 shows the training set spectra for the wavelength range between $5075\textup{\AA}{}$ and $5120\textup{\AA}{}$ . From the NSO atlas (Figure 1) it is clear that this wavelength range should only contain weak microtelluric lines. Spectra in Figure 4 (left) are color-coded by the intensity of the calibrating water telluric line at 5898.16Å and it is difficult to see correlated growth for any microtelluric lines. However, when the pixels in each spectrum are color-coded by the strength of the PCC (regressed against a pixel in the core of the 5898.16Å line), even telluric lines with a depth close to the photon noise in the continuum emerge with high confidence (Figure 4, middle). A close-up view (see right panel of Figure 4) highlights a detected microtelluric line with a depth only slightly greater than the photon noise.

Moreover, SELENITE is accurate for microtellurics, whose depth is close to the shot noise of the spectra. As an example, the pixel intensity at the center of a shallow microtelluric line is plotted against the pixel intensity of the 5898.16Å calibration line in Figure 5. Following the format for Figure 2, the telluric spectra in the wavelength region around 5898.16Å and the spectra near 5086.3Å (Figure 5 right) are color-coded according to the depth of the 5898.16Å line. The linear regression between the calibration line and the underscored microtelluric line at 5086.3Å is shown in the left panel of Figure 5 and models the intensity of the microtelluric line with a mean SSE of 0.009, comparable to the $\rm S/N$ of the spectrum.

4.2.5 Using an Arbitrary Pixel as a Calibrator

A powerful feature of SELENITE is that any arbitrary pixel or ensemble of pixels in the database can be substituted for the calibration pixel without requiring additional analysis by dividing each linear coefficient by the scale factor from the original calibration pixel to the new calibration pixel. As an example, Equation 6 shows a how model based on calibration $A$ can be transformed to a model based on calibration line $B$ .

[TABLE]

The linear coefficients in the regression model were derived with B-stars (telluric stars) because these spectra have both high $\rm S/N$ and few spectral lines. However, once the linear coefficients have been derived, the coefficients can be used to model telluric contamination in spectra of later type stars as long as the selected telluric calibration line is isolated from the stellar absorption lines or the stellar absorption feature is well enough known (for example, by spectral synthesis modeling) that it can be divided out. The ability to use the database to switch between different calibrating pixels (described above) offers critical flexibility for modeling tellurics in spectra of late type stars.

4.3 Results for Water Tellurics

4.3.1 Model Goodness of Fit

We evaluate SELENITE’s goodness of fit using the B star HR3982’s telluric spectrum. The HR3982 spectrum used was generated by averaging 3 unique observations taken over 40 min to drive up its S/N. Goodness of fit was measured using the reduced chi squared ( $\chi_{red}^{2}$ ) test statistic. HR3892’s observed flux was treated as the true model, $F_{obs,i}$ , SELENITE’s model of the flux as the ”data”, $F_{model,i}$ and the error calculated by the data reduction pipeline (0.75% of continuum), scaled by (a) the root of the number of spectra coadded ( $\sqrt{3}$ ) and (b) the root of model’s flux ( $\sqrt{F_{model,i}}$ ) as the statistical errors, $\sigma_{model,i}=0.0075/\sqrt{3F_{model,i}}$ .

First, to estimate the data quality independent of telluric removal, we measured the $\chi_{red}^{2}$ of a 3200px wavelength range unaffected by telluric lines, 4892Å-4952Å, with unity. We found a $\chi_{red}^{2}$ of 1.03, suggesting that our errors were well-calibrated. Next, the $\chi_{red}^{2}$ of our model’s fit in a 3200px wavelength range with heavy water tellurics, 6472Å-6545Å was measured. This range was chosen because (a) it contains the most intense water tellurics bluewards of 6800Å and (b) it was free from stellar features. Only pixels where a telluric was detected were included in the $\chi_{red}^{2}$ calculation. A 25px range from 6521.5Å-6522.5Å was found to have errors $20\times$ higher than any other error, this region was flagged as an outlier and excluded. The $\chi_{red}^{2}$ of the telluric model was found to be 1.25. In particular, the line cores were fit well, with a $\chi_{red}^{2}$ of 1.11. To reach a similar $\chi_{red}^{2}$ in the affected and unaffected region, errors in the affected region need to be increased by $\sim 10.5\%$ .

Figure 6 (top) plots a 5Å excerpt from the affected region, with HR3982’s spectrum shown in purple and our model shown in blue. The fit’s residuals deviate from unity by $1.0\%$ on average, comparable to the unaffected regions of the spectrum and the performance of radiative transfer codes. (Ulmer-Moll, S. et al., 2019). One potential flaw in our model is that modelling all points without significant telluric signal as unity creates discontinuities in the telluric wings, however, Figure 6 (bottom) indicates these discontinuities are small, and most users will prefer to mask affected pixels rather than dividing out.

4.3.2 Relative Contribution of PWV and Airmass to Water Line Depth

A further result is that the contribution PWV to water line depth generally dominates over airmass. As an example, Figure 7 shows that a low airmass (z=1.144) observation of the 5900Å water lines can exhibit significantly greater line depth than a subsequent higher airmass (z=1.454) observation because of changes in PWV. While the water column density for an observation depends on both the average number density of absorbers along the line of sight (PWV) and the path length (airmass), PWV can vary by as much as an order of magnitude while airmass generally ranges between 1 and 2. In general, water line depth only weakly correlates with airmass. This lack of correlation can be exploited to distinguish water and non-water lines.

4.4 Non-water Tellurics

In this section, telluric absorption lines from molecules other than water are considered. Like water tellurics, each non-water telluric can be modeled by the radiative transfer equation for a plane parallel atmosphere and thus its signal intensity given by $\sigma_{\lambda_{i}}\cdot n_{j}\cdot z$ , where $n_{j}$ is the number density of the molecular species, $j$ .

Unlike water tellurics, however, non-water tellurics have no equivalent of PWV. Ignoring small seasonal variations in gases such as ${\rm CO_{2}}$ , $n_{j}$ is spatially and temporally fixed. Each non-water species in the atmosphere is evenly distributed with a constant number density. Therefore the column density of non-water lines only varies with airmass: by measuring airmass, we can predict the depth of every non-water line in the spectrum. As an example, Figure 8 (right) shows that over our observed range of airmass ( $z$ between $1.1-1.8$ ) the signal intensity of the oxygen telluric feature at 6277.7Å (Figure 8, left) is well fitted by the linear regression model $\ln(I_{6277.7\textup{\AA}{}})=m\cdot z+b$ . The slope of the regression model, $m$ , measures $\sigma_{\lambda_{i}}\cdot n_{j}$ . Another difference from the model for water lines is that the y-intercept (a fictitious extrapolation to zero airmass) is small, but non-zero.

Like water lines, non-water lines can be identified by measuring the correlation of their growth with airmass. Each pixel whose growth’s PCC with airmass is above a threshold, $k$ , is assumed to have non-water telluric and undergoes the same procedure as water telluric pixels. Again, this potentially allows for the detection of tellurics not listed in the HITRAN database.

Non-water lines can be readily distinguished from water lines because non-water lines have a low correlation with the water calibration pixels but a high correlation with airmass, and vice versa for water lines (see Section 4.3.2). Separating components that vary with airmass from those that don’t is a benefit of SELENITE that might well be useful outside the scope of this paper, which as in the near IR, where $\text{H}_{2}\text{O}$ , $\text{CO}_{2}$ and $\text{CH}_{4}$ lines mix. When a water and non water line blend, the composite line can have a significant correlation with both the water calibrator and airmass. A regression model is not fit to composite lines, but they are flagged in the database.

4.5 Results for Non-Water Tellurics

We evaluate SELENITES’s gooodness of fit by using the B star HR 3982’s telluric spectrum following the procedure described in Section 4.3.1. This time, however, we measured the $\chi_{red}^{2}$ of the models fit from 6257Å-6328Å, a 3200px wavelength range which encompasses the heart of the 6280Å $\text{O}_{2}$ $\gamma$ atmospheric band. Only pixels where a non-water telluric was detected were measured. The $\chi_{red}^{2}$ of the telluric model was found to be 1.17. To reach a similar $\chi_{red}^{2}$ in the affected and unaffected region, errors in the affected region need to be increased by $2.0\%$ . Figure 9 plots the model’s fit to two oxygen doublets in HR3982’s $\text{O}_{2}\ \gamma$ atmospheric band. The fit’s residuals deviate from unity by about $\sim 0.75\%$ on average, comparable to unaffected regions of the spectrum.

Unfortunately, there are no non-water species with telluric lines other than oxygen bluewards of 6800Å, so we cannot evaluate our model on other species. Fundamentally, however, any well mixed non- water species should in theory behave as oxygen does.

4.6 Modelling Tellurics in a K Dwarf Spectrum

Late-type stars display complex absorption features. These absorption features do not complicate SELENITE’s non-water modelling, which only measures airmass, but they do complicate water modelling, since they may blend with a calibration pixel’s line. To compensate for the loss of any given calibration pixel, a large ( $50+$ ) ensemble of potential calibration pixels are given in the database.

Calibration pixels which are blended with stellar lines are identified and removed as follows. Initially, a telluric model is built by regression against the average of all calibration pixel depths. If any calibration line is blended with a stellar line, the regression model with overestimate PWV and the depth of every non-blended water line, but will underestimate the depth of the blended calibration pixel’s line. This calibration pixel can then be removed from the calibration set, and the process repeated until the calibration set stablizes. Empirically, we find that as long as just 25% of calibration pixels remain, SELENITE generates a good fit.

We evaluate SELENITE’s fit on late-type stars with the K-dwarf $\alpha$ Centauri B. We measured the $\chi_{red}^{2}$ of the models fit at the 6450Å water band described in Section 4.3.1. This measurement, however, was complicated by $\alpha$ Centauri B’s stellar lines: if a telluric line is blended with a stellar line, the model’s fit will appear incorrect. This problem was overcome by noticing that changes in the Earth’s barycentric velocity will substantially shift the stellar lines in two observations of $\alpha$ Centauri B taken months apart while leaving the telluric lines in the same position. Tellurics that are blended in the first observation will often be unblended in the second observation, and vice versa.

To illustrate, Figure 10 (top) shows SELENITE’s fit to two observations of $\alpha$ Centauri B, at barycentric velocities of 1860 m/s and 20500 m/s, for the same 5Å wavelength range shown in Section 4.3.1. In the 20500 m/s observation, the deep line at 6475Å seems ill fit by the model’s pair of water lines (underlined), but in the 1860 m/s observation the deep line has shifted, revealing that it was a stellar line blended with a pair of water line which the model now fits well. The fit’s residuals, shown in Figure 10 bottom, show that when tellurics are removed the two spectra are indeed the same. Where the residuals do not contain a stellar line, they deviate from unity by an average of 1.1%, comparable to the results of a radiative transfer code.

When we compute $\chi_{red}^{2}$ , if the spectrum grossly deviates from a pixel fit (by $3.0\%$ or more of the continuum) we assume that the pixel is blended with a stellar line and reject it. Following this procedure, we found an $\chi_{red}^{2}$ of 2.95 and 3.17 for the 1840 m/s and 20500 m/s $\alpha$ Centauri B observations. This fit, while acceptable, is somewhat poorer than HR3982’s fit, in large part because telluric lines often blend with stellar line tails, disrupting their profile slightly. For example, the wings of the small telluric at 6472.5Å (at the far left of Figure 10) are blended with a small stellar telluric, inflating the measurement of $\chi_{red}^{2}$ .

5 Discussion

Because of the barycentric velocity of the Earth, telluric lines raster across the stellar line profiles in time-series Doppler measurements. Even shallow microtelluric features will degrade the fidelity of high-resolution spectra and may contribute up to 0.5 m s*-1* to the RV error budget. Since the Earth induces a radial velocity of 10 cm s*-1* in the Sun, telluric contamination is a significant challenge in the search for analogs of our world. In this paper, we present SELENITE, an empirical technique for identifying and modelling telluric features in the optical (4500Å-6800Å), using the observations: (a) water tellurics grow proportionally to PWV and therefore proportionally to each other and (b) non-water tellurics grow proportionally to airmass. Water tellurics are identified by looking for pixels whose growth correlates with a known calibration water telluric and modelled by regression against it. Non-water tellurics are identified by looking for pixels whose growth correlates with airmass and modelled by regression against it. SELENITE has several advantages over the alternatives:

•

Runtime: Once the database is built ( $<3min$ on a standard PC) fitting a spectrum takes several seconds, permitting SELENITE to be used at the telescope to help guide observing runs.

•

Observing time: Unlike standard stars, after a one time observation of a few dozen B stars to build the database, SELENITE requires no further observations, saving observing time.

•

Requires no atomic/molecular line data: Unlike radiative transfer codes, SELENITE does not require atomic/molecular line data. This is useful because the literature suggests HITRAN is not always accurate. In particular, Seifahrt, A. et al. (2010) notes: ”Line data in HITRAN have strongly varying accuracy levels. Typical uncertainties of line positions range from a few to several hundred m/s, but can be as high as several km/s in extreme cases. Line strengths are rarely precise to the 1% level.” Further, Rudolf, N. et al. (2016) find that inaccuracies in the HITRAN database frustrate their ability to model water lines accurately.

•

Distinguishes tellurics that vary primarily with airmass from those that don’t: Although outside the paper’s scope, this feature could be very useful in the near IR, where $\text{H}_{2}\text{O}$ , $\text{CO}_{2}$ and $\text{CH}_{4}$ lines mix.

We acknowledge, however, that SELENITE has certain limitations. First, stellar features in the set of training B stars, (e.g., the Paschen and Brackett lines) will distort its model. This problem can be solved by interpolating over each absorption, at the cost of introducing additional uncertainity to regions of scientific interest. Second, SELENITE only varies with airmass and PWV. Other atmospheric phenomena which may affect line profiles (e.g., wind speed (Caccin et al., 1985)) is not taken into account. Instrumental changes, such as a varying LSF, are also not considered, and can only be handled by rebuilding the database for each instrumental profile change. Third, SELENITE’s PCC cutoff threshold produces discontinuities. While these discontinuities are small from CHIRON’s high $\rm S/N$ data, at lower $\rm S/N$ a line’s wings may not clear the PCC threshold, truncating them.

Despite these limitations, evaluations show that SELENITE provides excellent fits. The model’s fit to regions of intense water tellurics and non-water tellurics in the B star HR3982 had $\chi^{2}_{red}$ of 1.25 and 1.17, and thus errors just 10.5% and 2.0% bigger than the continuum’s fit to unity. Further, SELENITE’s fits to the K-dwarf $\alpha$ Centauri B observations had $\chi^{2}_{red}$ of 2.95 and 3.17, despite the $\chi^{2}_{red}$ test statistic being inflated by stellar line blending, confirming that it provides a good fit to late-type stars. SELENITE’s average residual is $1.0\%$ and $0.75\%$ for HR3982 and $1.1\%$ for $\alpha$ Centauri B, comparable to the residuals of radiative transfer codes (Ulmer-Moll, S. et al., 2019).

6 acknowledgements

Acknowledgements: The authors gratefully acknowledge enabling support from the following grants NSF-1616086, NSF-MRI0923441, NASA-NNH17ZDA001N-XRP, NASA-NNH11ZDA001N-OSS. NSO/Kitt Peak FTS data used here were produced by NSF/NOAO.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Artigau et al. (2014) Artigau, É., Astudillo-Defru, N., Delfosse, X., et al. 2014, in Proc. SPIE, Vol. 9149, Observatory Operations: Strategies, Processes, and Systems V, 914905
2Bender et al. (2012) Bender, C. F., Mahadevan, S., Deshpande, R., et al. 2012, Ap J, 751, L 31
3Bertaux et al. (2014) Bertaux, J. L., Lallement, R., Ferron, S., Boonne, C., & Bodichon, R. 2014, A&A, 564, A 46
4Blake & Shaw (2011) Blake, C. H., & Shaw, M. M. 2011, PASP, 123, 1302
5Caccin et al. (1985) Caccin, B., Cavallini, F., Ceppatelli, G., Righini, A., & Sambuco, A. M. 1985, A&A, 149, 357
6Cunha et al. (2014) Cunha, D., Santos, N. C., Figueira, P., et al. 2014, A&A, 568, A 35
7Figueira et al. (2010) Figueira, P., Pepe, F., Lovis, C., & Mayor, M. 2010, A&A, 515, A 106
8Fischer et al. (2016) Fischer, D. A., Anglada-Escude, G., Arriagada, P., et al. 2016, PASP, 128, 066001