Estimating dayside effective temperatures of hot Jupiters and associated uncertainties through Gaussian process regression
Emily K. Pass, Nicolas B. Cowan, Patricio E. Cubillos, Jack G., Sklar

TL;DR
This paper introduces a Gaussian process regression method to estimate the dayside effective temperatures of hot Jupiters from limited broadband observations, providing more robust uncertainty estimates than previous approaches.
Contribution
The authors develop and validate a new Gaussian process regression technique for estimating exoplanet temperatures, demonstrating its effectiveness with simulated and real observational data.
Findings
GP method yields more robust uncertainty estimates.
Unbiased temperature estimates possible with three broadband measurements.
Application to 12 hot Jupiters provides temperature estimates with uncertainties of 66-136 K.
Abstract
In this work, we outline a new method for estimating dayside effective temperatures of exoplanets and associated uncertainties using Gaussian process (GP) regression. By applying our method to simulated observations, we show that the GP method estimates uncertainty more robustly than other model-independent approaches. We find that unbiased estimates of effective temperatures can be made using as few as three broad-band measurements (white-light HST WFC3 and the two warm Spitzer IRAC channels), although we caution that estimates made using only IRAC can be significantly biased. We then apply our GP method to the twelve hot Jupiters in the literature whose secondary eclipse depths have been measured by WFC3 and IRAC channels 1 and 2: CoRoT-2 b; HAT-P-7 b; HD 189733 b; HD 209458 b; Kepler-13A b; TrES-3 b; WASP-4 b; WASP-12 b; WASP-18 b; WASP-33 b; WASP-43 b; and WASP-103 b. We present…
| Planet | (%) | Teff (K) | ||
|---|---|---|---|---|
| WFC3 | IRAC CH1 | IRAC CH2 | ||
| CoRoT-2 b | 0.0400 0.007 1 | 0.355 0.020 2 | 0.500 0.020 2 | 1846 076 |
| HAT-P-7 b | 0.0520 0.001 3 | 0.156 0.009 4 | 0.190 0.006 4 | 2775 096 |
| HD 189733 b | 0.0096 0.0039 5 | 0.256 0.014 6 | 0.214 0.020 6 | 1490 068 |
| HD 209458 b | 0.0082 0.0015 7 | 0.119 0.007 8 | 0.123 0.006 8 | 1523 066 |
| Kepler-13A b | 0.0734 0.0028 9 | 0.156 0.031 10 | 0.222 0.023 10 | 2918 109 |
| TrES-3 b | 0.0480 0.007 11 | 0.356 0.035 12 | 0.372 0.054 12 | 1830 083 |
| WASP-4 b | 0.0630 0.005 11 | 0.319 0.031 13 | 0.343 0.027 13 | 1900 079 |
| WASP-12 b | 0.1590 0.004 14,15 | 0.421 0.011 15 | 0.428 0.012 15 | 2899 100 |
| WASP-18 b | 0.1045 0.0006 16 | 0.300 0.02 17 | 0.390 0.02 17 | 3067 104 |
| WASP-33 b | 0.1190 0.006 18 | 0.260 0.05 19 | 0.410 0.02 19 | 3108 113 |
| WASP-43 b | 0.0456 0.0010 20 | 0.190 0.01 21 | 0.224 0.018 21 | 1464 067 |
| WASP-103 b | 0.1510 0.015 22 | 0.446 0.38 22 | 0.569 0.014 22 | 3205 136 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Estimating dayside effective temperatures of hot Jupiters and associated uncertainties through Gaussian process regression
Emily K. Pass,1,2,3 Nicolas B. Cowan,1,2,4,5 Patricio E. Cubillos6 and Jack G. Sklar4,7
1McGill Space Institute, 3550 rue University, Montreal, QC, H3A 2A7, Canada
2Institut de Recherche sur les Exoplanètes, Université de Montreal, C.P. 6128, Succ. Centre-ville, Montreal, QC H3C 3J7, Canada
3Department of Physics & Astronomy, University of Waterloo, 200 University Ave W, Waterloo, ON, N2L 3G1, Canada
4Department of Physics, McGill University, 3600 rue University, Montreal, QC, H3A 2T8, Canada
5Department of Earth & Planetary Sciences, McGill University, 3450 rue University, Montreal, QC, H3A 0E8, Canada
6Space Research Institute, Austrian Academy of Sciences, Schmiedlstrasse 6, A-8042, Graz, Austria
7School of Computer Science, McGill University, 3480 Rue University, Montreal, QC H3A 2A7, Canada E-mail: [email protected]
(Accepted XXX. Received YYY; in original form ZZZ)
Abstract
In this work, we outline a new method for estimating dayside effective temperatures of exoplanets and associated uncertainties using Gaussian process (GP) regression. By applying our method to simulated observations, we show that the GP method estimates uncertainty more robustly than other model-independent approaches. We find that unbiased estimates of effective temperatures can be made using as few as three broad-band measurements (white-light HST WFC3 and the two warm Spitzer IRAC channels), although we caution that estimates made using only IRAC can be significantly biased. We then apply our GP method to the twelve hot Jupiters in the literature whose secondary eclipse depths have been measured by WFC3 and IRAC channels 1 and 2: CoRoT-2 b; HAT-P-7 b; HD 189733 b; HD 209458 b; Kepler-13A b; TrES-3 b; WASP-4 b; WASP-12 b; WASP-18 b; WASP-33 b; WASP-43 b; and WASP-103 b. We present model-independent dayside effective temperatures for these planets, with uncertainty estimates that range from 66 K to 136 K.
keywords:
methods: numerical – planets and satellites: atmospheres
††pubyear: 2019††pagerange: Estimating dayside effective temperatures of hot Jupiters and associated uncertainties through Gaussian process regression–Estimating dayside effective temperatures of hot Jupiters and associated uncertainties through Gaussian process regression
1 Introduction
1.1 Effective temperature
The effective temperature of an exoplanet—i.e., the temperature of a perfect blackbody emitting the same flux and with the same radius as the real planet—is an important quantity. Together with surface gravity, it is one of the most basic parameters for modelling planetary atmospheres (for review, see Fortney, 2018). In particular, effective temperature estimates allow for analyses of Bond albedo and day-to-night heat transport (e.g., Cowan & Agol, 2011; Crossfield et al., 2012; Schwartz & Cowan, 2015).
Information on the planet’s dayside can be obtained through measurements of the secondary eclipse, where the planet is observed to pass behind its host star. For an edge-on system, the secondary eclipse depth is related to the planet’s spectral radiance by (Cowan et al., 2007):
[TABLE]
wherein is the secondary eclipse depth, is the ratio of planetary to stellar radii, and is the ratio of spectral radiances for a planet with brightness temperature around a host star with brightness temperature at wavelength . The second term, which depends on the wavelength-dependent geometric albedo and the planet’s semi-major axis , is the contribution from reflected starlight. For sufficiently hot planets, reflected light can be neglected in the infrared regime, as the planet’s thermal emission dominates. For this work, we therefore ignore the contribution of reflected light. However, we note that this is an approximation; for example, see Keating & Cowan (2017), who find reflected light is non-negligible for the hot Jupiter WASP-43 b at near-infrared wavelengths.
If secondary eclipse depths were measured at all wavelengths, the planet’s dayside effective temperature could be calculated directly. By rearranging Equation 1, secondary eclipse depth can be converted to planetary spectral radiance. Integrating this spectral radiance over all wavelengths yields the total radiance, , with effective temperature then determined using the Stefan-Boltzmann law:
[TABLE]
In reality, secondary eclipse depth is not known at all wavelengths. At best, a planet may have broad-band photometry from the Infrared Array Camera (IRAC; Fazio et al., 2004) on Spitzer (Werner et al. 2004; e.g., Deming et al., 2007; Nymeyer et al., 2011; Todorov et al., 2014; Kammer et al., 2015; Wong et al., 2015) and some near-IR observations from ground-based telescopes (e.g., Zhao et al., 2012; Croll et al., 2015; Zhou et al., 2015; Martioli et al., 2018), as well as 1.1– spectral data from the Wide Field Camera 3 (WFC3; Cheng et al., 2000) on the Hubble Space Telescope (Bahcall 1986; e.g., Crouzet et al., 2014; Ranjan et al., 2014; Line et al., 2016; Cartier et al., 2017; Mansfield et al., 2018). More realistically, most planets are observed in only a small subset of these bands. In particular, planets with only 3.6 and measurements are common (Figure 1 of Schwartz & Cowan, 2015), as these are the two IRAC channels which remain active on post-cryogenic Spitzer (Ingalls et al., 2016).
To determine the effective temperature, one must therefore interpolate/extrapolate the spectral radiance across all wavelengths, often from as few as two data points, or use spectral retrieval codes to generate spectra that are consistent with the observations. It is most insightful to convert spectral radiance to brightness temperature before interpolation, as—to first order—brightness temperature is constant as a function of wavelength. The inverse Planck function supplies the appropriate transformation:
[TABLE]
1.2 Estimation methods
Generally, a planet’s brightness temperature varies with wavelength as a result of the atmosphere’s pressure-temperature profile and wavelength-dependent opacity. However, blackbody-like behaviour is expected for isothermal, cloud-free atmospheres (Seager et al., 2005; Fortney et al., 2006) or planets with high-altitude cloud decks (Marley et al., 1999). Bayesian Information Criterion (BIC; Schwarz, 1978) analysis of exoplanets with secondary eclipse measurements in multiple bands has suggested that blackbody fits are often statistically favoured over detailed spectral fitting (Hansen et al., 2014; Garhart et al., 2019). Such results motivate the Error-Weighted Mean (EWM) method of effective temperature estimation. By assuming a Planck-like spectrum, the effective temperature can be calculated as simply the average of brightness temperature measurements, with measurements weighted by their respective uncertainties (Schwartz & Cowan, 2015).
Other model-independent estimates do not assume the planet behaves as a blackbody. With the Linear Interpolation (LI) method, the brightness temperature is assumed to be constant at shorter wavelengths than the shortest wavelength observed and at longer wavelengths than the longest wavelength observed. Between observations, brightness temperature is assumed to vary linearly with wavelength. This method implicitly gives more weight to measurements near the planet’s blackbody peak and preserves any spectral detail in the observations (Cowan & Agol, 2011), although it is not as robust against outliers as the EWM.
Model fitting with radiative transfer codes is another method for estimating dayside effective temperatures (e.g., Fortney et al., 2005; Stevenson et al., 2010; Mancini et al., 2013; Morley et al., 2017), with a variety of spectral retrieval frameworks currently in use (Barman et al. 2005; Irwin et al. 2008; Line et al. 2013; Benneke 2015; Waldmann et al. 2015; Lavie et al. 2017; Henderson et al. 2017; Gandhi & Madhusudhan 2018; see review in Madhusudhan 2018). Such fitting algorithms are advantageous as they allow system parameters and physical constraints to further inform effective temperature estimates. Nevertheless, these estimates must be used with some caution. Most retrieval models have a dozen parameters and hence the fitting is underconstrained, even for the most observed planets. Outlier measurements can therefore have a dramatic impact on the resulting fit, leading to inaccurately-constrained parameters (Hansen et al., 2014). Model-dependent methods are also only as accurate as the physics and assumptions included in the model; for example, Feng et al. (2016) and Blecic et al. (2017) show how the standard assumption of a single 1D thermal profile can bias the interpretation of a hot Jupiter’s emission spectrum, while Line et al. (2016) discuss the problematic assumption of a cloud-free atmosphere, as clouds strongly influence the spectra of brown dwarfs at similar temperatures. Even when clouds are taken into account, rigorous modelling of their effects is difficult due to the complexity of the physical processes involved in cloud formation (e.g., Lee et al., 2016, 2017; Roman & Rauscher, 2019). Model fitting also requires significant computational power, although retrieval frameworks powered by machine-learning algorithms aim to mitigate this issue (Márquez-Neila et al., 2018; Zingales & Waldmann, 2018).
Model-independent effective temperature estimates can also be enhanced by machine learning. In this work, we benchmark the performance of one such technique: Gaussian process (GP) regression. GPs introduce a Bayesian approach to effective temperature estimation, robustly evaluating uncertainties and yielding blackbody-like results when there is insufficient data to support a more complex fit.
In Section 2.1, we describe our GP method and motivate the selection of hyperparameters. Our simulated observations are outlined in Section 2.2, with a description of our evaluation metric and alternate methods in Section 2.3. The performance of each method is benchmarked on simulated observations in Section 3.1 and applied to archival data in Section 3.2. Results are summarized in Section 4.
2 Methods
2.1 Gaussian process regression
2.1.1 Introduction to Gaussian process regression
In this section, we provide a brief, intuitive description of GP regression. For a detailed, mathematical introduction to the technique, see Rasmussen & Williams (2006).
GP regression provides a flexible approach to model fitting that offers distinct advantages over a traditional parameterized model. A parameterized model limits possible fits to a single class of functions; for example, you might assume the data follow a linear trend, and therefore search for the best-fitting parameters of the form . The EWM and LI methods operate in this way, with the former using a line of zero slope and the latter using piecewise-defined lines. In contrast, the GP method does not assume a specific functional form. A GP is therefore capable of fitting a linear trend, a quadratic trend, a sinusoidal trend, or other behaviours that are more complex and not known a priori.
To do this, GPs require the specification of a covariance function (also called a kernel) and a mean function; a GP is completely specified by these two functions. The covariance function, perhaps unsurprisingly, specifies the covariance between points, and the parameters of this function are called hyperparameters. One of the simplest covariance functions is the squared exponential kernel, which takes the form:
[TABLE]
The squared exponential kernel corresponds to an infinite set of basis functions, and it returns a covariance that depends only on the distance in input-space between points, . This kernel is governed by two hyperparameters: , the scale length, and , the signal variance. The scale length can be understood as the characteristic distance in input-space over which the output function varies, while the signal variance represents the characteristic amplitude of features. In this work, is the difference in frequency between two data points, is the characteristic value of over which features in the spectrum are correlated, and is the typical amplitude of features in normalized brightness temperature (at the data’s spectral resolution). Various methods may be used to set these hyperparameters, such as motivating them from physical constraints or optimizing them using the input data (Ambikasaran et al., 2015). We adopt a hybrid modelling approach that borrows elements from both methods; this approach is the focus of Section 2.1.2.
A zero mean function is commonly used for GPs, although more complex functions can be chosen. In this work, we adopt mean functions that are constant but non-zero (namely, the error-weighted mean brightness temperature). The chosen mean function is most important in sparsely-populated regions of input-space, as our GP will take the value of this function in the absence of other data (i.e., when the nearest data point is many length scales away).
In summary, a GP is a covariance function and a mean function. Once these properties have been specified, including the hyperparameters of the covariance function, the GP can accept input data (with uncertainties) and output a fit to the data (with uncertainties). This output represents the range of functions consistent with the input data and covariance kernel. The GP’s specificity results from an appropriate choice of hyperparameters for a given data set (which we will discuss in Section 2.1.2). However, it is important to note that even with specified hyperparameters, a GP still allows for much more flexible fits than could be achieved with a parameterized model (such as with the EWM or LI).
GPs have gained traction in the astrophysical community in recent years, being employed to fit stellar oscillations (Brewer & Stello, 2009), model the occurrence rate density of exoplanets (Foreman-Mackey et al., 2014), and estimate photometric redshifts (Almosallam et al., 2016), among other applications. In the field of exoplanetary atmospheres, GPs have been used to account for instrument systematics in primary transit and secondary eclipse light curves (Gibson et al., 2012; Evans et al., 2015), although further applications remain to be explored.
These GP regressions primarily involve intermediate-sized data sets on the order of a hundred to a thousand data points (). Larger data sets have historically been intractable due to computation time scaling as , although newer methods aim to improve performance for big data (Foreman-Mackey et al., 2017). However, our concern is with the opposite extreme: small data sets in which trends are sparsely sampled, often with as few as two or three measurements. As these observations do not critically sample the underlying function, optimizing the hyperparameters from the observations themselves would result in inaccurate fits with poorly-estimated uncertainties. However, this does not mean that GP regression is unusable in the sparse-data regime. By physically motivating the hyperparameters using related data sets, appropriate hyperparameters can be chosen and information content retrieved, even with few observations.
2.1.2 Implementation of Gaussian process regression
We use the George library (Ambikasaran et al., 2015) for Python-based Gaussian process regression. Our GP uses a squared exponential kernel with the mean function set to the error-weighted mean. It therefore mimics the EWM method at wavelengths where the nearest observation is many length scales away. However, the EWM method assumes Gaussian-distributed uncertainties and involves weighting by the inverse of the variance (). This makes the method non-robust against outliers. Given the sparsity of our data, outliers can have a dramatic effect on our results; in our mean function, we therefore elect to weigh the points by the inverse of the error (). This weighting assumes the uncertainties follow a doubled-sided exponential distribution, although it is used more generally in situations where the uncertainties are mostly Gaussian but the presence of outliers is suspected (Aster et al., 2013). In light of the above discussion, it is unsurprising that we find the weighting outperforms the standard EWM method.
Before processing with the GP, the brightness temperature spectrum is normalized and converted from wavelength to frequency, as a linear length scale is expected in the frequency domain (Rybicki & Lightman, 2004). We also convert these frequencies to petahertz to reduce the risk of overflow errors during data processing.
We consider two approaches to set the GP’s hyperparameters. The first involves optimizing the two hyperparameters using SciPy’s minimize routine (Jones et al., 2001) to minimize the negative likelihood of the GP model. This approach is problematic for the sparse data that is currently available, as our sampling rate is much smaller than the expected length scale. This results in fits biased towards slowly-varying functions, as we have insufficient information to identify the covariance of our data.
Our second approach resolves this sampling problem by optimizing the length scale and signal variance using external data sets, where we do have enough information to constrain covariance. We must therefore identify an appropriate data set for estimating the characteristic length scale of hot Jupiter emission spectra. While we expect structure over a variety of length scales, water vapour is the dominant opacity source at all relevant temperatures in the infrared (Figure 6 of Fortney, 2018). For this reason, we estimate the length scale hyperparameter using the HITEMP (Rothman et al., 2010) line list database for H2O.
Using the first approach, wherein we use SciPy’s minimization routine to estimate hyperparameters, we analyze the intensity (wavenumber per column density) spectrum of water up to 5 Hz (). By varying the binning and noise for this spectrum (Figure 1), we identify three significant length scales: the first at -8.55, the second between -10 and -9, and the third between -11.5 and -11. These length scales appear as plateaus or shallow downwards trends in Figure 1; outside of these regions, length scale varies rapidly as the resolution of the spectrum is increased. Note that in these calculations, has been divided by units of PHz, resulting in unitless values for the log-length scale hyperparameter, ln().
We therefore find that water possesses structure over a variety of length scales. However, not all these scales are relevant for our purposes due to the finite spectral resolution and signal-to-noise of current observations. To identify which length scales correspond to relevant spectral details, we examine the GP fits to the water spectrum. We consider log-length scale hyperparameters of -8.55 (low-resolution binning), -9.62 (mid-resolution binning), and -11.1 (high-resolution binning), and display the corresponding GP output in Figure 2. Low-resolution binning allows for detection of the smooth periodic trend. Mid-resolution binning allows double peaks to be distinguished, while high-resolution binning reveals further fine structure. Given our desired precision and the quality of our brightness temperature data, only the smooth trend is relevant for our Gaussian process. We therefore constrain the log-length scale hyperparameter to 0.05 (Figure 1, inset), equivalent to Hz, or at a wavelength of . As we find that varying this hyperparameter within its uncertainty range does not perceptibly influence the results of our Gaussian process regression, we fix the log-length scale hyperparameter as .
While this investigation also yields signal variance hyperparameters, such estimates are not physically relevant; the y-axis of the water spectrum represents intensity (a molecular cross-section), which cannot be converted to brightness temperature without additional information about the planet’s atmosphere (namely, the vertical temperature profile and the presence of clouds). A different data set is therefore required to inform our estimate of signal variance. This hyperparameter governs the amplitude of variations, thereby setting the uncertainty in the brightness temperature at frequencies far from observed data points. Model emission spectra could be used for this task, although this would require setting a parameter space to explore, determining the physics to include in the models, estimating the relative abundance of planets with each composition, and so on. In short, such training would compromise the model-independent nature of GP regression.
As an alternative, we train the signal variance hyperparameter on the 54 hot Jupiters that have been observed in secondary eclipse, as listed in the exoplanets.org database (Han et al., 2014, accessed July 2018). After excluding planets that have been observed in only one band, planets lacking stellar parameters, and two planets with potentially erroneous database entries,111This latter category contains XO-3 b and HAT-P-2 b, for which the database lists an unrealistic secondary eclipse depth and uncertainty in secondary eclipse depth, respectively. 37 hot Jupiters remain in the training set. Using Equations 1 and 3, we calculate brightness temperature spectra for these planets. For each star, stellar spectral radiance is determined using the Kurucz model (Kurucz, 1970; Castelli & Kurucz, 2004) that best matches the star’s effective temperature and surface gravity. Uncertainties in brightness temperature are propagated from uncertainties in known parameters using a 1000-iteration Monte Carlo. To consider the different fits that are possible as a result of these uncertainties, we use a random Gaussian distribution to generate 1000 versions of the observations and fit each set of observations using GP regression, setting the log-length scale to and optimizing the signal variance with SciPy.
This method yields an ensemble of 37,000 signal variance hyperparameters centred at ; as with the length scale, this is the natural logarithm of the of Section 2.1. As our brightness temperature spectra are normalized, this value can easily be converted to a percentage amplitude using = 0.14, or 14%. In other words, at the low spectral resolution of currently-available observations, we expect 14% changes in brightness temperature as a function of wavelength. This is broadly in agreement with our expectations from atmospheric physics: the skin layer of the atmosphere has temperature (Pierrehumbert, 2010), corresponding to 16% deviations from the planet’s effective temperature. As the skin temperature represents the atmosphere’s coolest layer, this suggests that 16% is a reasonable upper limit on the potential variability. The presence of clouds would decrease the amplitude of the variations.
In general, one should set the GP’s signal variance through a random sampling of the 37,000 estimated hyperparameters. However, the training data are not ideal. As the uncertainties in the training data are large, the upper tail of the hyperparameter distribution allows for highly inflated uncertainties in our GP-determined temperatures. For this reason, we fix the log-signal variance hyperparameter as , or 14%. This choice is consistent with theoretical expectations, as we have discussed. It also becomes strongly motivated following our analysis in Section 3.1: with this hyperparameter, we retrieve statistically-appropriate distributions of effective temperature estimates.
2.2 Simulated data set
2.2.1 Model spectra
To benchmark the performance of the Gaussian process method and compare its efficacy to other approaches, we require a suite of simulated emission spectra from which to generate sample observations. We create this suite with the open-source Python Radiative Transfer in a Bayesian framework (Pyrat Bay222http://pcubillos.github.io/pyratbay; Cubillos et al., in prep.). Six theoretical emission spectra are computed for each of the 54 hot Jupiters that have been observed in secondary eclipse, as listed in the exoplanets.org database as of July 2018 (Han et al., 2014). The resulting sample of 324 model emission spectra provides a diverse set of hot Jupiter atmospheres with known effective temperatures on which to test our measurement technique.
Pyrat Bay, which is based on the Bayesian Atmospheric Radiative Transfer package (Blecic, 2016; Cubillos, 2016), produces 1D atmospheric models and combines them with a line-by-line radiative transfer module to model the emission spectra over the planet’s dayside hemisphere. We assume cloud-free atmospheres in hydrostatic, thermochemical, local-thermodynamic, and radiative equilibrium. The atmospheric domain ranges from to bar, sampled with 120 layers equidistant in logarithmic space. For the atmospheric composition, we adopt models with scaled solar metallicities; i.e., we start with solar elemental mixing ratios (Asplund et al., 2009), scaling the abundances of all elements beyond H and He. Given the temperature, pressure, and elemental compositions, we compute the thermochemical equilibrium abundances using the open-source Thermochemical Equilibrium Abundances (TEA) code (Blecic et al., 2016).
To attain radiative equilibrium, we follow the iterative procedure described in Malik et al. (2017); i.e., we compute the upward and downward fluxes across the atmosphere under the two-stream approximation (see their Equation 10, with ). On each iteration, the code updates the temperature profile such that the divergence of the bolometric net fluxes converges to a negligible value at each layer. During this process, since the thermochemical equilibrium calculation is the most computationally demanding task, we update the chemistry only every eight iterations.
The domain of the spectrum ranges from 0.3 to , sufficient to contain the bulk of the stellar and planetary radiation. At the top of the atmosphere, we model the incident stellar irradiation as a blackbody function at the effective temperature of the star, attenuated by a Bond albedo , and considering good day–night energy redistribution over the planet. At the bottom of the atmosphere, we include an internal radiative heat term corresponding to a 100 K blackbody. This choice is commonly made elsewhere in the literature (e.g., Fortney et al., 2007), as it is approximately the value for Jupiter. However, this term has a negligible impact on the results.
We consider opacities from the main spectroscopically active species expected for exoplanets at these wavelengths: H2O and CO2 from Rothman et al. (2010); CH4, NH3, and HCN from Yurchenko & Tennyson (2014); CO from Li et al. (2015); Na and K from Burrows et al. (2000); Rayleigh opacities from H, He, and H2 (Kurucz, 1970; Lecavelier Des Etangs et al., 2008); and collision-induced absorption from H2–H2 (Borysow et al., 2001; Borysow, 2002) and H2–He (Borysow et al., 1988, 1989; Borysow & Frommhold, 1989). Since the HITEMP and ExoMol databases consist of billions of line transitions, we first apply the repack package (Cubillos, 2017) to extract only the strong line transitions that dominate the opacity spectrum between 300 and 3000 K. The final line list contains 63 million transitions.
We consider three metallicities for each planet in the sample: [M/H] = , 0, and 1; and two different Bond albedos: = 0.0 and 0.3. Overall, our radiative equilibrium temperature profiles are similar to those of Malik et al. (2017) and follow similar trends; for example, both display similar behaviours when we vary the atmospheric metallicity.
The Reproducible Research Compendium for the Pyrat Bay radiative equilibrium models of this article is available at github.com/pcubillos/PassEtal2018_EmissionSpectra.
2.2.2 Simulated observations
To simulate broad-band observations, each theoretical emission spectrum is converted from spectral radiance to a simulated eclipse depth using Equation 1, neglecting the reflected-light term and assuming / and K. A more appropriate treatment would use Kurucz models or vary / and , although consideration of such complexities is time intensive. As we find that the simplified and intensive methods yield comparable results for a test subset of our spectra, we elect to use the simplified version for efficiency. Eclipse depth is averaged over the band and Equations 1 and 3 are used to convert the band-integrated eclipse depth back to brightness temperature.
For benchmarking the performance of our GP regression, we consider the conservative case of 3.6 and broad-band measurements, corresponding to IRAC channels 1 and 2, as well as WFC3 spectral data. Following the precedent of Stevenson et al. (2017) and Keating & Cowan (2017), we elect not to include any simulated ground-based observations. As Croll et al. (2015) discuss, uncertainties claimed for these measurements have historically been unreliable due to the systematics involved with near-IR, ground-based photometry and a lack of demonstrated repeatability. While Croll et al. (2015) and Zhou et al. (2015) show that uncertainties for ground-based eclipse measurements can be robustly estimated through careful systematic analysis and multiple independent measurements, there is not yet a critical mass of such repeated measurements in the literature. While we neglect ground-based observations in this analysis, our GP regression method can trivially be extended to an arbitrary number of observations in any ensemble of bands.
We consider a variety of binning options for the simulated WFC3 observations, although we find that the resolution of the spectral data does not significantly impact effective temperature estimation for observations taken over one eclipse. For this reason, we elect to treat the WFC3 data as another broad-band measurement.
2.2.3 Simulated uncertainties
We can directly calculate uncertainties in our simulated measurements in the photon-limited regime, a good approximation for space-based observations. As the flux is dominated by the star, we can approximate the number of photons in a given band using only the stellar spectral radiance:
[TABLE]
wherein is the system throughput, is the telescope diameter, is the distance to the star, and is the integration time. The photon-limited precision is /, with the factor resulting from the differential measurement of eclipse depth, assuming one acquires as much out-of-eclipse baseline as in-eclipse observations (Cowan et al., 2015). We then propagate uncertainty in eclipse depth to uncertainty in spectral radiance. This uncertainty is simulated using a random Gaussian distribution before conversion to brightness temperature, as the uncertainty in brightness temperature is non-symmetric given symmetric uncertainties in flux.
We consider three physical scenarios to set uncertainty levels. In the first, we assume photon-limited precision and a distance of pc, corresponding to an HD 189733 b analogue. In the second, we place the planet at a distance of pc, the median value of the 43 planets in the exoplanet.org secondary eclipse sample with known distances. For the final case, we place the planet at pc and inflate the uncertainties in eclipse depth by a factor of two to represent pessimistic deviations from photon-limited precision (Hansen et al., 2014). We assume an integration time of hr, corresponding to the typical duration of one eclipse (Exoplanets Data Explorer, Han et al., 2014). and are set to the published values for HST WFC3 or IRAC, depending on the observation being simulated.
These three scenarios can be interpreted as best, good, and typical uncertainty levels; however, the signal-to-noise ratio (SNR) of the simulated eclipses can still vary significantly within a scenario as a result of hotter planets emitting more photons. For this reason, we present our results grouped by SNR of the simulated eclipse depth rather than by physical scenario. We consider eclipses with as low uncertainty, with as medium uncertainty and with as high uncertainty. Simulated eclipses with are rejected from our analysis. These categories contain 32%, 23%, 26%, and 19% of the simulated observations, respectively.
2.3 Evaluation metric and method comparison
We consider 100 sets of simulated brightness temperatures for each of 324 models and three physical scenarios, resulting in 97,200 data sets. First, we estimate effective temperature using the Error-Weighted Mean and Linear Interpolation methods. For the EWM method, we average the brightness temperatures, weighting each observation by the inverse square of its uncertainty (Schwartz & Cowan, 2015). For the LI method, we set a constant brightness temperature shortward of the shortest wavelength measurement and longward of the longest wavelength measurement, linearly interpolating between the other measurements using SciPy’s interp1d routine (Cowan & Agol, 2011). The resulting spectrum is converted to an effective temperature using Equations 2 and 3. For both methods, we calculate uncertainty using a 100-iteration Monte Carlo.
We then investigate the 97,200 data sets using GP regression. As before, we estimate uncertainties using a 100-iteration Monte Carlo (i.e., by performing GP regression on 100 instances of the data set). We then sample 100 output functions from each fit; while the Monte Carlo accounts for the uncertainties in the observations, the latter approach accounts for the uncertainty in the GP result. The 10,000 brightness temperature functions are then converted to effective temperatures using Equations 2 and 3. Uncertainty is calculated as the standard deviation of the 10,000 effective temperature estimates.
The three methods produce three effective temperature estimates () for each of the 97,200 mock spectra, each with a corresponding uncertainty. We consider two metrics to benchmark the performance of these three estimation methods. The first is the ratio of the estimated effective temperature and the actual effective temperature (), shown in Figure 3 of Cowan & Agol (2011); this metric allows us to directly compare the accuracy of the different methods. The second metric is the z-score:
[TABLE]
This metric evaluates the accuracy of the reported uncertainty, as well as the accuracy of the effective temperature estimate. A method that accurately estimates both the effective temperature and its uncertainty has a z-score distribution centered at zero with a standard deviation of one.
3 Results
3.1 Analysis of simulated data
Our results are presented in Figure 3, with a sample fit shown in Figure 4. Using the GP method, we find z-score distributions (mean standard deviation) of 0.85 (), 0.99 (), and 0.87 (), all close to the desired distribution of 0 1.333The intermediate SNR regime has a mean z-score that is somewhat further from 0 than the other two. This is an artifact of our SNR regime definitions: we consider three physical scenarios (based on planet distance and deviations from photon-limited precision), but we report our results grouped by SNR. This means that each grouping is a superposition of discrete distributions, not a single, smooth distribution. The effect of this is particularly noticeable in the intermediate SNR regime, as it contains many planets from each of the three scenarios. If we instead perform our z-score analysis on the original scenario-grouped data (as opposed to the SNR-grouped data), we find that the mid-data quality scenario ( = 245 pc) yields a z-score distribution that is more similar to the low SNR and high SNR regimes: -0.09 0.89. The EWM and LI methods result in z-score distributions that are biased and that have standard deviations larger than 1. This effect is most pronounced for simulated observations with high SNR, suggesting that these methods are capable of quantifying the error due to the uncertainty in individual data points, but not the error due to the spectrum’s undersampling. The GP method considers both sources of uncertainty and hence performs accurately in all three SNR regimes.
While the GP method outperforms the LI and EWM methods in uncertainty estimation, all three methods produce reasonable effective temperature estimates, as evidenced by the similar distributions in Figure 3. The GP and LI methods in particular produce almost identical distributions centred around 1, while the EWM is slightly biased in the positive direction. However, we find that switching the EWM method from to weighting (as motivated in Section 2.1.2 and implemented in our GP method) reduces this bias. If the EWM is used in future work, we therefore recommend that it be used with weighting. However, we stress that the LI and EWM methods significantly underestimate the uncertainty in the effective temperature and hence should not be the preferred estimator for future work.
We have shown that the GP, LI, and EWM methods are capable of estimating the effective temperature of a planet with three observed eclipse depths (WFC3, IRAC channel 1, and IRAC channel 2) in an unbiased way, with the GP method able to accurately evaluate the uncertainties in these estimates. As many planets have only been observed with warm Spitzer, it may be tempting to try to extend these methods to data sets with only 3.6 and eclipse measurements. However, such analyses should be undertaken with caution. In Figure 5, we show the z-score and distributions for an analysis of the same 324 model spectra, but for which observations have only been simulated in the warm Spitzer bands. Both the z-score and distributions are skewed left for all three estimation methods, indicating that the effective temperature is consistently underestimated when only the two IRAC bands are used. This result is unsurprising: the 3.6 and bands will always emanate from higher in the atmosphere and hence represent the cooler part of the spectrum, as we do not include any models with thermal inversions in our analysis (see Figure 4). An estimate made with only these two bands will therefore underestimate the planet’s effective temperature. To a lesser extent, the same effect can also be seen in Figure 3: since we do not consider atmospheres with thermal inversions, our most certain data point (the WFC3 measurement) always represents a deeper, warmer layer in the atmosphere. Where the WFC3 measurement has a very small uncertainty (SNR > 20), we therefore see the distribution slightly skewed towards overestimates of temperature, regardless of estimation method.
The results of Figure 5 can be seen as a worst-case scenario: in addition to only considering non-inverted atmospheres, the models evaluated here do not contain clouds, which would flatten the spectrum and hence reduce the amplitude of the bias. However, an estimate of uncertainty should not assume significantly greater performance than this worst-case. We therefore conclude that the effective temperature is not known to better than 20% at the 1 level if only 3.6 and eclipse depths have been observed. This is in contrast to the results of Cowan & Agol (2011), who claim a 1 systematic error of 5% for observations taken in two bands (their Figure 3). However, their numerical experiments considered random combinations of broad-bands. The discrepancy in these results suggests that systematic error is strongly influenced by the particular wavelengths observed and not merely the number of observations.
3.2 Analysis of archival data
Having verified the accuracy of our GP estimates, we apply our method to archival data. The last systematic study of effective temperatures for archival hot Jupiters was performed by Cowan & Agol (2011), which predates WFC3 observations. The uniform analysis we present here therefore provides a unique catalogue of effective temperatures, with no counterparts currently available elsewhere in the literature. Our catalogue can be interpreted as a hypothesis: motivated by our analysis of simulated data in Section 3.1, we assert that 68% of the time, the true effective temperatures of planets in our catalogue will fall within the 1 intervals we provide. This hypothesis will soon be testable, as upcoming missions such as the James Webb Space Telescope (JWST; Beichman et al., 2014) and the Atmospheric Remote-sensing Infrared Exoplanet Large-survey (ARIEL; Puig et al., 2016) will allow full secondary eclipse spectra to be measured for these hot Jupiters, unambiguously determining bolometric fluxes and dayside effective temperatures. However, our method will not be redundant in the age of JWST: since full spectral coverage will often require observations with multiple instruments and modes, there will always be targets that can benefit from a robust method for predicting the full bolometric flux from sparse observations.
A review of the literature yields thirteen hot Jupiters with WFC3 emission spectra, twelve of which have also been observed in the 3.6 and IRAC channels. We neglect the thirteenth, WASP-121 b (Evans et al., 2017), as it has not been observed at . The secondary eclipse depths for the twelve planets are listed in Table 1. Where available, we use the band-averaged WFC3 eclipse depth as published in the referenced paper. Where only the full spectral data have been published, we take the average over the spectrum, determining uncertainties using a 1000-iteration Monte Carlo. In the few cases where asymmetric uncertainties in the eclipse depth are given in the source paper, we list the largest of the two. For WASP-12 b, we adopt the WFC3 eclipse depth from the Stevenson et al. (2014b) reanalysis as opposed to that originally published in Swain et al. (2013).
We apply our GP regression method to the twelve planets. As in Section 2.1.2, we use Kurucz models for stellar spectral radiance and adopt the system parameters (stellar , stellar log(), and ) tabulated in the Exoplanet Data Explorer (Han et al., 2014, accessed July 2018), with a stellar log() of 4.71 0.20 cms*-2* for CoRoT-2 (Ammler-von Eiff et al., 2009). Uncertainties in are propagated from uncertainties in eclipse depth and system parameters using a 1000-iteration Monte Carlo. The results of our analysis are listed in the final column of Table 1.
In Figure 6, we show these GP-determined effective temperature estimates alongside each planet’s irradiance temperature, . Dashed lines are used to indicate equilibrium temperatures, , and maximal dayside temperatures, ; the former corresponds to a planet with zero Bond albedo and perfect heat circulation, while the latter corresponds to a planet with zero Bond albedo and negligible circulation. For the twelve archival planets, we find that our estimated dayside effective temperatures fall between these extremes, suggesting imperfect heat transport between the planets’ day and night sides, and/or non-zero Bond albedo. Our results are consistent with Schwartz & Cowan (2015), who found that hotter planets generally deviate more strongly from equilibrium expectations (their Figure 2).
4 Summary
We have developed a Gaussian process method for estimating the dayside effective temperatures of hot Jupiters and their associated uncertainties, based on sparse secondary eclipse broad-band spectra. Our GP is trained on the HITEMP and exoplanets.org databases, from which we find a log-length scale of ( Hz) and a log-signal variance of (14%). With these hyperparameters, we are able to retrieve accurate effective temperature estimates for a diverse sample of simulated planets. The code for our method is publicly available on GitHub444http://github.com/ekpass/gp-teff and can estimate a planet’s effective temperature in seconds.
Through analysis of 97,200 sets of simulated WFC3 and warm Spitzer observations, we show that the GP method is capable of producing unbiased effective temperature estimates with more accurate uncertainties than other model-independent methods (Figure 3, based on HST/WFC3/G141, Spitzer/IRAC/ch1, and Spitzer/IRAC/ch2). However, we find significant bias with all methods when only IRAC channel 1 and channel 2 observations are used. We caution that effective temperature can only be constrained with 1 confidence to within 20% of the true value in such instances (Figure 5).
The other methods considered in our analysis—the Linear Interpolation method of Cowan & Agol (2011) and the Error-Weighted Mean method of Schwartz & Cowan (2015)—estimate effective temperatures with similar accuracy, although they produce biased z-score distributions. As error from the undersampling of the spectrum dominates in the high-SNR regime and such error is neglected by the EWM and LI methods, the poorest uncertainty estimates are produced for high-SNR data. Additional uncertainty may be added in quadrature to account for this effect, as in Cowan & Agol (2011), although these methods cannot dynamically determine undersampling’s contribution. The GP method considers this uncertainty implicitly and hence appropriately estimates error in all SNR regimes. While we therefore recommend that the GP method replace the EWM method in general use, if the EWM is for some reason preferred, we advocate for the adoption of error-weighting. We find this produces more accurate, outlier-resistant results than the weighting that has been used historically.
Having validated the GP method on simulated observations, we conclude with an analysis of archival data. In Table 1, we provide effective temperature estimates with robustly determined uncertainties for the twelve hot Jupiters with published WFC3 and IRAC observations. These results may inform future statistical analyses of day-to-night heat transport and Bond albedo, in the vein of Cowan & Agol (2011) and Schwartz & Cowan (2015).
Acknowledgements
E.K.P. is funded by an Institut de Recherche sur les Exoplanètes (iREx) Trottier Excellence Grant and a Natural Sciences and Engineering Research Council (NSERC) Undergraduate Student Research Award. The authors thank Evelyn Macdonald and the anonymous referees for comments that improved this manuscript, and James Xu for preliminary comparisons of effective temperature estimators.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Almosallam et al. (2016) Almosallam I. A., Lindsay S. N., Jarvis M. J., Roberts S. J., 2016, MNRAS , 455, 2387 · doi ↗
- 2Ambikasaran et al. (2015) Ambikasaran S., Foreman-Mackey D., Greengard L., Hogg D. W., O’Neil M., 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence , 38 · doi ↗
- 3Ammler-von Eiff et al. (2009) Ammler-von Eiff M., Santos N. C., Sousa S. G., Fernandes J., Guillot T., Israelian G., Mayor M., Melo C., 2009, A&A , 507, 523 · doi ↗
- 4Arcangeli et al. (2018) Arcangeli J., et al., 2018, Ap J , 855, L 30 · doi ↗
- 5Asplund et al. (2009) Asplund M., Grevesse N., Sauval A. J., Scott P., 2009, ARA&A , 47, 481 · doi ↗
- 6Aster et al. (2013) Aster R. C., Borchers B., Thurber C. H., 2013, Parameter Estimation and Inverse Problems. Elsevier
- 7Bahcall (1986) Bahcall N. A., 1986, Annals of the New York Academy of Sciences , 470, 331 · doi ↗
- 8Barman et al. (2005) Barman T. S., Hauschildt P. H., Allard F., 2005, Ap J , 632, 1132 · doi ↗
