Assimilation of semi-qualitative sea ice thickness data with the EnKF-SQ

Abhishek Shah; Laurent Bertino; Francois Counillon; Mohamad El; Gharamti; Jiping Xie

arXiv:1904.12590·stat.AP·January 8, 2020

Assimilation of semi-qualitative sea ice thickness data with the EnKF-SQ

Abhishek Shah, Laurent Bertino, Francois Counillon, Mohamad El, Gharamti, Jiping Xie

PDF

TL;DR

This paper introduces and tests the EnKF-SQ data assimilation method for integrating semi-qualitative sea ice thickness measurements into a coupled ice-ocean model, demonstrating its effectiveness over simpler approaches.

Contribution

The paper presents the EnKF-SQ method, a novel stochastic data assimilation technique specifically designed for semi-qualitative data, applied successfully to Arctic sea ice modeling.

Findings

01

EnKF-SQ improves assimilation of range-limited ice thickness data.

02

The method's performance is robust to tuning parameter variations.

03

EnKF-SQ is effective despite non-linearities and non-Gaussian variables.

Abstract

A newly introduced stochastic data assimilation method, the Ensemble Kalman Filter Semi-Qualitative (EnKF-SQ) is applied to a realistic coupled ice-ocean model of the Arctic, the TOPAZ4 configuration, in a twin experiment framework. The method is shown to add value to range-limited thin ice thickness measurements, as obtained from passive microwave remote sensing, with respect to more trivial solutions like neglecting the out-of-range values or assimilating climatology instead. Some known properties inherent to the EnKF-SQ are evaluated: the tendency to draw the solution closer to the thickness threshold, the skewness of the resulting analysis ensemble and the potential appearance of outliers. The experiments show that none of these properties prove deleterious in light of the other sub-optimal characters of the sea ice data assimilation system used here (non-linearities, non-Gaussian…

Equations16

x_{i}^{a} = x_{i}^{f} + K (y_{i} - H x_{i}^{f}),

x_{i}^{a} = x_{i}^{f} + K (y_{i} - H x_{i}^{f}),

K = P^{f} H^{T} (H P^{f} H^{T} + R)^{- 1},

K = P^{f} H^{T} (H P^{f} H^{T} + R)^{- 1},

P^{f} = \frac{1}{N - 1} i = 1 \sum N (x_{i}^{f} - \overset{ˉ}{x}^{f}) (x_{i}^{f} - \overset{ˉ}{x}^{f})^{T},

P^{f} = \frac{1}{N - 1} i = 1 \sum N (x_{i}^{f} - \overset{ˉ}{x}^{f}) (x_{i}^{f} - \overset{ˉ}{x}^{f})^{T},

f (x) = {w e^{- (x - μ)^{2} / 2 σ_{i r}^{2}}, x \leq μ, w e^{- (x - μ)^{2} / 2 σ_{or}^{2}}, x > μ,

f (x) = {w e^{- (x - μ)^{2} / 2 σ_{i r}^{2}}, x \leq μ, w e^{- (x - μ)^{2} / 2 σ_{or}^{2}}, x > μ,

σ_{or} = Climatological mean \int \displaylimits_{μ}^{+ \infty} y f_{c} (y) d y - μ .

σ_{or} = Climatological mean \int \displaylimits_{μ}^{+ \infty} y f_{c} (y) d y - μ .

σ_{o b s} = 0.06 t + 0.05,

σ_{o b s} = 0.06 t + 0.05,

RMSE_{t} = \frac{1}{n} i = 1 \sum n (\overset{ˉ}{x}_{i, t}^{f} - x_{i, t}^{r})^{2},

RMSE_{t} = \frac{1}{n} i = 1 \sum n (\overset{ˉ}{x}_{i, t}^{f} - x_{i, t}^{r})^{2},

AES_{t} = \frac{1}{n} i = 1 \sum n σ_{i, t}^{2},

AES_{t} = \frac{1}{n} i = 1 \sum n σ_{i, t}^{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Assimilation of semi-qualitative sea ice thickness data with the EnKF-SQ

Abhishek Shah

Nansen Environmental and Remote Sensing Center, Bergen, Norway

Laurent Bertino

Nansen Environmental and Remote Sensing Center, Bergen, Norway

François Counillon

Nansen Environmental and Remote Sensing Center, Bergen, Norway

Mohamad El Gharamti

National Center for Atmospheric Research, Boulder, Colorado

Jiping Xie

Nansen Environmental and Remote Sensing Center, Bergen, Norway

Abstract

A newly introduced stochastic data assimilation method, the Ensemble Kalman Filter Semi-Qualitative (EnKF-SQ) is applied to a realistic coupled ice-ocean model of the Arctic, the TOPAZ4 configuration, in a twin experiment framework. The method is shown to add value to range-limited thin ice thickness measurements, as obtained from passive microwave remote sensing, with respect to more trivial solutions like neglecting the out-of-range values or assimilating climatology instead.

Some known properties inherent to the EnKF-SQ are evaluated: the tendency to draw the solution closer to the thickness threshold, the skewness of the resulting analysis ensemble and the potential appearance of outliers. The experiments show that none of these properties prove deleterious in light of the other sub-optimal characters of the sea ice data assimilation system used here (non-linearities, non-Gaussian variables, lack of strong coupling). The EnKF-SQ has a single tuning parameter that is adjusted for best performance of the system at hand. The sensitivity tests reveal that the results do not depend critically on the choice of this tuning parameter. The EnKF-SQ makes overall a valid approach for assimilating semi-qualitative observations into high-dimensional nonlinear systems.

Keywords— Semi-qualitative observations, range limitation, SMOS, ice thickness, TOPAZ4, EnKF-SQ.

1 Introduction

Sea ice plays a crucial role in the Arctic climate as it modulates the exchange of heat and moisture between the ocean and the atmosphere (Aagaard and Carmack, 1989; Screen and Simmonds, 2010). Different studies have shown that accurate knowledge of the Sea Ice Thickness (SIT) is beneficial for the Arctic sea ice predictability (Day et al., 2014; Collow et al., 2015; Guemas et al., 2014). The SIT observations from the European Space Agency (ESA) Soil Moisture and Ocean Salinity (SMOS) mission are available in near-real time, at daily frequency during the cold season (October-April). The retrieval method for SMOS SIT observations is based on measurements of the brightness temperature at a low frequency microwave (1.4 GHz, L-band: wavelength of $21$ cm) (Kaleschke et al., 2010). The representative depth for the L-Band microwave frequency into the sea ice is about $0.5$ m for first-year level ice (Kaleschke et al., 2010; Huntemann et al., 2014). Few studies have shown that assimilating thin SIT from SMOS into coupled ice-ocean model, using ensemble based Data Assimilation (DA) techniques, is able to improve the SIT forecast without being detrimental to other properties (e.g., Yang et al., 2014; Xie et al., 2016; Fritzner et al., 2019). All of these studies, however, ignore the saturated observations of thick ice.

Measurements of thick sea ice on basin-wide scales are also available from laser altimeters onboard ICESat (Forsberg and Skourup, 2005) or from radar altimeters on the European Remote Sensing (ERS), Envisat, CryoSat-2 and Sentinel-3 (Connor et al., 2009; Laxon et al., 2013; Ricker et al., 2014). CryoSat-2 SIT is provided in near-real time (Tilling et al., 2016) but still contains considerable large uncertainties caused by the lack of auxiliary data on snow depth. These uncertainties are proportionally larger for thin ice (i.e., $<1$ m) and hence CryoSat-2 practically measures thick sea ice only. A merged product of weekly SIT observations in the Arctic from the CryoSat-2 altimeter and SMOS radiometer, referred as CS2SMOS, has also been developed by combining the two complementary datasets (Kaleschke et al., 2015; Ricker et al., 2017) and made available during the winter months since October 2010. However, the combination of the two satellites is not perfect as biases have been revealed on overlapping areas (Wang et al., 2016; Ricker et al., 2017). Recently, Xie et al. (2018) successfully assimilated the merged SIT product CS2SMOS into the TOPAZ4 coupled ocean-sea ice reanalysis system (Sakov et al., 2012) for the Arctic.

While assimilating a merged SIT map, rather than two satellites data streams is practically convenient, the uncertainty of the merged data is more difficult to quantify and bad quantification of the uncertainty may affect the assimilation performance negatively (Mu et al., 2018)iiiIt should be noted that the comparison of assimilating merged versus separate data is not informative because their observation errors are not equivalent. The ability to use well-justified observation errors in data assimilation is sufficiently important to motivate the assimilation of the two separate SIT data streams rather than one merged product. This implies that their detection limits should be taken into account by the data assimilation method.

In DA, observations are used to reduce the error of the state variables so that the forecast skill can be enhanced. Many observations can only be retrieved within a limited interval of the values that the observed quantity would take in nature. In other words, observations may have a detection limit. One such example is the aforementioned observation of SIT from SMOS. Although, the SIT observations with detection limit do not provide quantifiable data (hard data) above its detection limit, they do give qualitative information (soft data). For instance, the ice could be thicker than a known threshold. Studies from Shah et al. (2018) and Borup et al. (2015) have shown that assimilating soft data with linear and non-linear toy models using ensemble-based DA methods have the potential to improve the accuracy of the forecast. Therefore, not considering soft data in the assimilation procedure is a potential loss of meaningful information.

Assimilating only thin ice observations, as in Xie et al. (2016, Figure 5 and 6), induces a low bias, which is caused by the partial nature of the observation of thin ice. With a new method intended for semi-qualitative data as the EnKF-SQ, the question arise whether this bias can be mitigated or not? The comparison of the EnKF-SQ to the perfect Bayesian solution (Shah et al., 2018) shows that the EnKF-SQ analysis does not coincide with the Bayesian posterior and bears inherent biases: in the case of hard data, the Bayesian and EnKF-SQ posteriors are nearly the same. However, for out-of-range observations and mode of a prior within the observable range, only the maximum likelihood of the EnKF-SQ analysis is preserved but its distribution is flatter than the Bayesian solution with a thicker tail in the unobservable range, so the expectation is too high. Based on this, the EnKF-SQ is expected to be unbiased for thin SIT observations. Nevertheless, it should show a positive bias for out-of-range observations. Further, the thicker tail of the EnKF-SQ analysis distribution in the unobservable range makes it relatively skewed, which is undesirable in a Kalman filtering context.

In this study, we implement and test the overall performance of the stochastic ensemble Kalman filter semi-qualitative (EnKF-SQ) (Shah et al., 2018) in a twin experiment where synthetic SMOS-like SIT observations, with an upper detection limit, are assimilated into a coupled ocean-sea ice forecasting system. The objective is to test the potential of the EnKF-SQ for assimilating soft data with a state of the art ocean and sea ice prediction system, namely TOPAZ4. In addition, a number of single-cycle assimilation experiments using the EnKF-SQ are performed to investigate the sensitivity to the ensemble size and out-of-range observation uncertainty.

This paper is organized as follows: Section 2 introduces the main components of the TOPAZ4 system including the model and the EnKF-SQ DA scheme used in the assimilation experiments. In Section 3, the synthetic ice thickness data are outlined together with the assimilation setup. Section 4 discusses the results of the various assimilation experiments. A general discussion of the study concludes the paper in Section 5.

2 The TOPAZ system

2.1 Model setup

The ocean general circulation model used in the TOPAZ4 system is the version 2.2 of the Hybrid Coordinate Ocean Model (HYCOM) developed at the University of Miami (Bleck, 2002; Chassignet et al., 2003). The TOPAZ4 implementation of HYCOM uses hybrid coordinates in the vertical, which smoothly shift from isopycnal layers in the stratified open ocean to z level coordinates in the unstratified surface mixed layer.

The HYCOM ocean model is coupled to a one-thickness category sea ice model. The single ice thickness category thermodynamics are described in Drange and Simonsen (1996) and the ice dynamics use the Elastic-Viscous-Plastic (EVP) rheology of Hunke and Dukowicz (1997) with a modification from Bouillon et al. (2013). The momentum exchange between the ice and the ocean is given by quadratic drag formulas. The model has a minimum thickness of $10$ cm for both new and melting ice.

The model domain covers the North Atlantic and Arctic basins as shown in Figure 1. The model grid is created with conformal mapping (Bentsen et al., 1999) and has a quasi-homogeneous horizontal resolution between $12-16$ km in the whole domain. The grid has $880\times 800$ horizontal grid points.

2.2 The Ensemble Kalman Filter Semi-Qualitative, EnKF-SQ

The EnKF-SQ (Shah et al., 2018) uses an ensemble of model states to estimate the error statistics closely following the stochastic EnKF algorithm (Burgers et al., 1998; Evensen, 2004). The stochastic EnKF is a two-step filtering method alternating forecast and analysis steps. In the forecast step, the ensemble of model states is integrated forward in time and when observations become available, an analysis of every forecast member, $\mathbf{x}_{i}^{f}$ for $i\in{1,2,...,N}$ , is computed as follows:

[TABLE]

where $\mathbf{K}$ is the Kalman gain matrix; $\mathbf{x}_{i}$ is the $i^{th}$ ensemble state member; $\mathbf{H}$ is the observation operator, mapping the state variable to the observation space (could be non-linear); $\mathbf{R}$ is the observation error covariance matrix; $\mathbf{y}_{i}$ is the $i^{th}$ perturbed observation vector generated from ${\cal N}(\mathbf{y},\mathbf{R})$ and $\mathbf{P}^{f}$ is the ensemble forecast error covariance matrix. The superscripts $a$ , $f$ , and $T$ stand for analysis, forecast, and matrix transpose, respectively. In practice, $\mathbf{P}^{f}$ is never computed explicitly and is instead decomposed as follows:

[TABLE]

where $\bar{\mathbf{x}}^{f}$ is the mean of the forecast ensemble.

The EnKF-SQ is intended to explicitly assimilate observations with a detection limit. These are divided into two categories depending on whether they are within or outside the observable range. If the observed quantity is within it, the quantitative (hard) data is assimilated as in the stochastic EnKF, otherwise it is considered a qualitative (soft) data and treated differently.

The specific value and error statistics of the out-of-range (OR) observations are unknown. In order to assimilate OR observations, an assumption needs to be made about its likelihood. Following Shah et al. (2018), a virtual observation is created at the detection limit and then a two-piece Gaussian observation likelihood is constructed around it. A two-piece Gaussian distribution is obtained by merging two opposite halves of two different Gaussian probability density functions (pdfs) at their common mode, given as follows:

[TABLE]

where $w=\sqrt{\frac{2}{\pi}}(\sigma_{ir}+\sigma_{or})^{-1}$ is a normalizing constant, $\mu$ is the detection limit and also the common mode of two different normal distribution; $\sigma_{ir}$ and $\sigma_{or}$ are in-range and OR observation error standard deviations (std), respectively.

Figure 2 is an illustration of a two-piece Gaussian observation likelihood for OR SIT observations. On the left hand side of the detection limit, it is assumed that $\sigma_{ir}$ , inside the observable range, is defined by the observation uncertainty of hard data at the detection limit. An observation could possibly fall outside the detection limit, due to observation errors, even though its true value is within the observable range. On the right hand side, it is assumed that the $\sigma_{or}$ (eq. 5) in the unobservable range is defined with the help of a climatological mean for SIT above the detection limit so that extremely high values, which are usually less realistic, receive a lower likelihood (Shah et al., 2018).

[TABLE]

$f_{c}(y)$ is the pdf of the climatological data of the observed quantity. The two-piece Gaussian observation likelihood for soft data is denoted, hereafter, by ${\cal{N}}_{2p}(\mu,\sigma_{ir}^{2},\sigma_{or}^{2})$ . The EnKF-SQ pre-processes the observations by sorting them as either hard $y^{h}$ or soft $y^{s}$ . The observation errors are assumed uncorrelated in space, i.e. $\mathbf{R}$ is diagonal.

Update step of the EnKF-SQ

For each forecast member $\mathbf{x}_{i}^{f}$ ( $i\in 1,2,\ldots,N$ ):

For each soft data $y^{s}_{j}$ , check whether the observed forecast ensemble member is within the observable range or not. 2. 2.

If $\mathbf{H}_{j}\mathbf{x}_{i}^{f}\leq\mu$ , set observation error variance $\mathbf{R}_{j,j}=\sigma_{ir}^{2}$ otherwise $\mathbf{R}_{j,j}=\sigma_{or}^{2}$ implying that members inside (outside) the observable range are updated with data parameterized using in-range $\sigma_{ir}^{2}$ (out-of-range $\sigma_{or}^{2}$ ). 3. 3.

After looping over all soft data, compute the Kalman gain $\mathbf{K}_{i}$ as in Eq. 2 with the updated observation error covariance matrix $\mathbf{R}$ . For each $\mathbf{x}_{i}^{f}$ , a different Kalman gain $\mathbf{K}_{i}$ is calculated. 4. 4.

Evaluate the $i^{\text{th}}$ analysis member $\mathbf{x}_{i}^{a}$ as in Eq. 1 using $\mathbf{K}_{i}$ . The perturbed observations are generated by sampling from ${\cal{N}}(y^{h}_{j},\sigma_{h}^{2})$ and ${\cal{N}}_{2p}(\mu,\sigma_{ir}^{2},\sigma_{or}^{2})$ iiiiii $\sigma_{ir}$ is a special case of $\sigma_{h}$ , for hard data at the detection limit. for $y^{h}_{j}$ and $y^{s}_{j}$ , respectively. $\sigma_{h}^{2}$ is the observation error variance for $y_{j}^{h}$ .

Loop to next member $i$ .

Repeating this process for all forecast members yields the analysis ensemble. For a detailed description of the EnKF-SQ the reader is referred to Section 2 of Shah et al. (2018).

3 Experimental Setup

3.1 The synthetic Sea Ice Thickness Data

The synthetic SIT data used in this study is intended to mimic the SIT data from the SMOS mission with an upper detection limit. In order to evaluate the EnKF-SQ method against a perfectly known truth, synthetic observations are generated using the coupled ocean and one-thickness category sea ice model described earlier in Section 2.1. A reference truth run (also called nature run) is produced by integrating the coupled ocean sea ice model from $1$ January, $2014$ to $31$ December, $2015$ using unperturbed atmospheric forcing from ERA-Interim (Dee et al., 2011). The run is initialized using member number $100$ from the $100$ -member ensemble reanalysis of Xie et al. (2017) on $31$ December, $2013$ .

Synthetic SIT data are then generated for the duration of the assimilation experiment from $11$ November $2014$ to $31$ March $2015$ by perturbing the truth with Gaussian noise of zero mean and standard deviation $\sigma_{obs}$ ; parameterized as:

[TABLE]

where $t$ is the truth for ice thickness in meters. The parameterization is chosen such that observation errors increase for thicker ice, which is a general behaviour of positive-valued variables like SIT. The relationship is obtained through regression of the absolute difference of the daily averaged SIT between the reanalysis product (Xie et al., 2017) and the aforementioned reference trajectory from the month of December $2014$ to January $2015$ . The resulting relationship (not shown here) is linear with a positive slope. SIT observation error represented in Eq. 6 is also qualitatively in line with those used by Xie et al. (2016) for SMOS data.

A single upper detection limit of $1$ m is imposed on the generated SIT observations, as an analogous for saturation of SMOS data in thick sea ice. The SIT observations are assumed available on every grid cell (except along the coastline) and assimilated on a weekly basis. This is a reasonable assumption as SMOS data comes with a resolution of ( $\sim 12.5$ km), which is also the resolution of the TOPAZ4 system. Model and observation grids are collocated, thus our experiments neglect potential errors due to interpolation, which is out of the scope of this study.

3.2 Out-of-range SIT Climatology

A trivial alternative to the EnKF-SQ in the presence of soft data would be to assimilate climatological data as hard data. It is, therefore, worth investigating how beneficial the assimilation of soft data with the EnKF-SQ is compared to assimilating climatology.

An out-of-range, location-dependent, SIT climatology is computed by taking a time average of the truth (described earlier) for SIT above the detection limit in each grid cell. Averaging is done from January $2014$ to December $2015$ , a period that includes two summers and two winters and encompasses the assimilation period. Even though the latter takes place in winter, the climatology has a high bias because by construction it only contains SIT above $1$ m. The observation error variance for the climatological value is also location-dependent, equal to the variance of all reference truth values above the detection limit in the same grid cell.

3.3 Assimilation setup

In contrast to earlier TOPAZ4 studies that updated the whole water column variables (Xie et al., 2018), here the state vector $\mathbf{x}$ consists of only two sea ice variables: SIT and sea ice concentration (SIC). This therefore constitutes a case of a weakly coupled assimilation where the ocean is only updated by dynamical re-adjustments from the sea ice updates. Kimmritz et al. (2018), have shown that while strongly coupled ocean and sea ice is clearly beneficial, weakly coupled DA can still achieve reasonable results.

In the analysis, sampling errors in the forecast error covariance can give rise to spurious correlations between remote grid points, a problem which may become more pronounced for smaller ensemble sizes (Houtekamer and Mitchell, 1998). A common practice to counteract sampling errors is to perform local analysis in which variables at each grid cell are updated using only the observations within a radius of influence $r_{o}$ around the grid cell (Houtekamer and Mitchell, 1998; Evensen, 2003). For simplicity, a single closest local observation within $r_{o}=300$ km is used here during the analysis.

In TOPAZ4, model error is introduced by increasing the model spread via perturbing few forcing fields. The perturbations are pseudo-random fields computed in a Fourier space with a decorrelation time-scale of 2 days and horizontal decorrelation length scale of $250$ km, as described in Evensen (2003). Perturbed variables include air temperature, wind speeds, cloud cover, sea level pressure (Sakov et al., 2012, Section 3.3) and yield curve eccentricity in the EVP rheology (Hunke and Dukowicz, 1997, Table 1). In addition, precipitation is also perturbed with log-normal noise and standard deviation of $100$ %. This affects the snowfall when temperatures are below zero. Snow is an important thermal insulator and therefore hampers sea ice growth/melt.

3.4 Target Benchmarks

The performance of the EnKF-SQ is compared against three different versions of the stochastic EnKF and a Free run, denoted as follows:

EnKF-ALL: No detection limit is applied on SIT observations thus even thick ice data from the reference run is assimilated. This run acts as an upper bound for performance because it is the only one that assimilates out-of-range observations as hard data with known statistics, which can be seen as cheating. 2. 2.

EnKF-CLIM: The SIT climatology with climatological variance is assimilated instead of hard data. 3. 3.

EnKF-IG: Only hard data is assimilated and soft data is ignored, similar to Xie et al. (2016). This run is meant to assess the added value of the EnKF-SQ. 4. 4.

Free-run: The Free-run is the average of the $99$ members without DA. It is run with perturbations, contrarily to the aforementioned single-member truth run.

To evaluate the performance of the different DA methods, we compute the root mean square error (RMSE) of the ensemble mean at time $t$ as:

[TABLE]

where $\mathbf{x}^{r}$ and $\bar{\mathbf{x}}^{f}$ is the n-dimensional reference (unperturbed truth) and mean of the prior state vector at time $t$ , respectively. We also monitor the average ensemble spread (AES) for each filter, which we calculate at every assimilation cycle as:

[TABLE]

where $\sigma_{i,t}^{2}$ can either be the prior or posterior ensemble variance at time $t$ , respectively.

3.5 Ensemble size

In order to select the ensemble size, single-cycle assimilation sensitivity experiments are conducted using EnKF-SQ by varying the ensemble size between $2$ and $99$ . The resulting time-averaged RMSE and AES of the posterior SIT estimates are displayed in Figure 3. The plot indicates that for $N\geq 10$ , there is no significant difference in the performance of the EnKF-SQ. This is mostly due to the small size of the local state vector; consisting only of two variables. An ensemble as small as $10$ members is however less likely to succeed on the long term especially if the number of state variable and observations increase. Results from the other three EnKF runs (not shown) showed the exact same behavior. Thus, the initial ensemble is set as the first $99$ members of the reanalysis ensemble of Xie et al. (2016) on $31$ December $2013$ . The initial ensemble is then spun up from January, $2014$ until the start of the assimilation experiment (i.e., November $11$ ) with perturbed forcing to increase the variability. As described earlier, member number $100$ of the reanalysis run was used to generate the truth in this study.

The assimilation framework is sub-optimal for few reasons, in particular because of the weakly coupled updates. Further, SIT errors are erroneously assumed Gaussian while they are not. These sub-optimalities are not uncommon in realistic applications. They do cause some limited loss of performance but generally do not prevent us from applying the EnKF.

In terms of computational resources, we used a single processor on supercomputer for each of the four DA methods. The total wall-clock time required by each analysis scheme, to update the SIT and SIC state variables along with the IO operations, is approximately $6$ minutes on a 1.4GHz Cray XE6. This is much less than the TOPAZ4 one-week forward model run, for which each member runs on $134$ parallel processors in approximately $5$ minutes.

4 Assimilation Results

4.1 Tuning the EnKF-SQ out-of-range likelihood

The out-of-range standard deviation $\sigma_{or}$ is the only new parameter introduced into the EnKF-SQ compared to the stochastic EnKF. Therefore, it is important to study how the uncertainty in the estimate of $\sigma_{or}$ affects the performance of the EnKF-SQ scheme. For this, we carried out a number of single-cycle assimilation experiments by introducing a scalar multiplier $\alpha$ to equation 5 such that $\sigma_{or}^{*}=\alpha\cdot\sigma_{or}$ .

RMSE and AES of the posterior SIT estimates are plotted in Figure 4 for a wide range of $\alpha$ , varying between $0.1$ and $3.0$ . Such a range is very broad for most realistic applications. $\alpha<0.4$ strongly degrades the accuracy of the EnKF-SQ along with significant decrease in the AES. The large difference between RMSE and AES values, indicate a possible filter divergence. This is because for small $\alpha$ values, the sampling of a two-piece Gaussian likelihood for observation perturbations is prone to generate samples concentrated around the detection limit, thus pulling the analysis close to the detection limit, subsequently reducing the ensemble spread and increasing the RMSE. As $\alpha$ approaches $1$ , the RMSE attains the minimum value and further becomes consistent with the AES. When $\alpha$ increases beyond $2$ , the sampling of OR likelihood starts producing large perturbations, which makes the analysis increment capricious and eventually deteriorates the performance of the EnKF-SQ. Accordingly, in what follows we set $\alpha=1$ .

To illustrate how the EnKF-SQ updates the SIT by assimilating range-limited SIT observations, we plot the prior mean (Figure 5a) and analysis increment (Figure 5b) on $11$ November $2014$ . The solid black line on both maps is the isoline for $1$ m of SIT. The forecast places the thick ice (up to $3$ m) north of Greenland and north-eastern part of Canada. The increments are not only visible outside of the $1$ m isoline but also inside the central Arctic region where only soft data are assimilated. It is important to notice that there is nearly zero increment in the central Arctic region and the Beaufort sea where the sea ice is thicker than $1.5$ m. This is because the EnKF-SQ analysis do not impose strong updates on the prior if it is above the detection limit and observations are out-of-range.

4.2 Performance Assessment

Figure 6 shows the time evolution of the RMSE and AES of the prior SIT estimates obtained using the EnKF-ALL, EnKF-SQ, EnKF-CLIM, EnKF-IG and the Free-run. The percentage of OR observations (to the total number of observations) available at every cycle is added to the plot. As expected, EnKF-ALL outperforms all other schemes while EnKF-IG is the least accurate. It should be noted that there is an increasing trend in the RMSE, which is seasonally driven; a similar behavior reported in Xie et al. (2016). Assimilating soft data with the EnKF-SQ clearly improves the prior RMSE compared to the EnKF-IG. This is consistent over the entire assimilation period. The number of OR observations gradually increases as the cold season intensifies leaving only a few hard data during the months of February and March 2015. Even with a very limited number of hard data, the EnKF-SQ outperforms EnKF-IG. The RMSE resulting from the EnKF-CLIM is marginally higher than that of the EnKF-SQ, except during the last three months of the assimilation experiment. The reason for this could be twofold: (i) In the early stages of the experiment, the climatology tends to overestimate SIT due to the large seasonal cycle compared to later months. This causes the climatology to pull the update towards large values and hence degrades the performance of the EnKF-CLIM. (ii) Fewer hard data leads to larger RMSE values in the EnKF-SQ as can be seen towards the end of winter and start of the spring. Overall, the RMSE and AES show consistent ensemble statistics such that sufficient variability is preserved in the system after cycling over time.

In order to visualize area-wise improvements, we plot the map of time-averaged RMSE of the SIT prior estimates in Figure 7. The EnKF-ALL yields the best RMSE throughout the entire region. Compared to the EnKF-IG, the EnKF-SQ performs better in the central Arctic region, Greenland’s north-eastern shelf, the Canadian Arctic Archipelago and in the Beaufort Sea. On average, the EnKF-SQ and EnKF-CLIM estimates are approximately $8$ % more accurate than those of the EnKF-IG.

The EnKF-CLIM, seems to produce larger improvements than the EnKF-SQ specifically along the Ellesmere island. However, it also increases the prediction error in the Beaufort sea more than that of the EnKF-IG. A number of reasons may explain this behavior. The climatology being too high compared to the seasonal mean yields an artificial increase of the model thickness, which happens to agree with the truth along the Ellesmere island. The recurrent update due to the assimilation of climatology is propagated dynamically by the Beaufort gyre into the Beaufort sea creating an anomaly compared to the truth, which is not thicker.

The analysis algorithm of the EnKF-SQ is designed such that improvements are expected mostly where SIT is close to the threshold. As a way to examine this, we computed the time-averaged RMSE of the prior SIT estimates for different ice thickness intervals of $25$ cm using all DA schemes (Figure 8). The values on the x-axis of Figure 8 represent the upper bounds of each $25$ cm SIT bin interval except for the first bin of size $10$ cm because of the model $10$ cm minimum thickness. The RMSE for all DA schemes within each SIT bin is computed by finding the location of grid cells for which the observations fall within the bin interval.

Figure 8 suggests that RMSE values for all schemes below $1$ m of SIT are approximately the same, as they all assimilate hard data. Once SIT increases beyond the detection limit, EnKF-ALL becomes the most accurate followed by the EnKF-SQ up to SIT of $2$ m. The EnKF-SQ performs as expected for observation values in the vicinity of the detection limit where the assimilation of soft data is clearly enhancing the accuracy compared to the EnKF-IG and EnKF-CLIM. The performance of the EnKF-SQ is not as good as the EnKF-CLIM for thicker ice, which can also be seen in Figure 7 around the northern coast of Greenland. It is worth noticing that even though there is no data to assimilate for SIT $>1$ m in the EnKF-IG scheme, it is performing better than the Free-run up to $2.25$ m of SIT. This advantage has been previously reported by Xie et al. (2016, see Figure 8) and can be either due to the reduction of the positive bias in the free run (shown in Figure 9) by assimilating thin ice only or due to dynamical model adjustments after assimilation. In other words, improvements to thin ice are propagated in time to the period where ice gets thicker.

4.3 Bias and Skewness Analysis

The EnKF-IG updates the prior members by only assimilating observations of thin ice with a maximum thickness of $1$ m. This causes the algorithm to introduce negative conditional bias for thick ice (knowing that the observation is thin ice, the assimilation reduces the ice thickness more that it can thicken it). Similarly, the EnKF-SQ update may introduce a bias towards the detection limit due to assimilation of soft data and the EnKF-CLIM towards the climatology. To investigate these likely biases in different DA schemes, we present a bar chart of time-averaged conditional bias for the posterior estimates of SIT in Figure 9. The conditional bias is calculated by finding the location of the grid cells for which the observations fall within the SIT bin interval. The positive values represent an overestimation of SIT after the assimilation and vice versa.

The four DA runs exhibit a small negligible positive bias of approximately $0.5$ to $1$ cm for thin ice. The Free-run bias, on the other hand, is larger than $\sim 6$ cm. Above the threshold limit, there is a clear positive bias of $5$ to $7$ cm in the EnKF-CLIM posterior estimates, up until $2$ m. As seen earlier, the climatology tends to overestimate the truth during the first few months of the experiment when the ice is thin (red dotted line in the Figure 9). EnKF-IG estimates, over the same interval, exhibit a small negative bias, possibly left over from the conditional assimilation of thin ice. It is important to note that there is almost zero bias in the EnKF-SQ estimates, matching that of the EnKF-ALL for $1\leq\text{SIT}\leq 2$ m.

There is a systematic increasing negative bias for SIT $>2$ m, which reaches almost $20$ cm for SIT $=3$ m in the Free-run, EnKF-IG and EnKF-SQ. A similar trend of negative bias is also observed in the EnKF-ALL and EnKF-CLIM runs but to a slightly lesser extent. The negative bias in the Free-Run is likely due to the perturbation of the forcing fields, specifically the wind perturbations, which can cause erratic movements of ice that export thicker sea ice into areas of thinner ice. Since all assimilation runs use perturbed winds, this effect is likely to impact the EnKF-IG and EnKF-SQ more than the EnKF-ALL and EnKF-CLIM. In addition, it is important to mention that there are fewer grid points (not shown here) in the bins for thicker ice compared to thin ice, which may also affect the estimation of the bias for these bins, making them statistically less significant.

As discussed in Shah et al. (2018), the two-piece Gaussian observation likelihood may influence the shape of the posterior distribution, making it skewed and thus less Gaussian. In order to examine this, we evaluate and plot the conditional skewness of the posterior estimates of SIT only at the last assimilation step in Figure 10. The conditional skewness of the posterior is calculated as the average value of the skewness for all grid cells where the truth falls within the interval of a bin in consideration. Note that contrary to the computation of the conditional BIAS at the location of the observations, the conditional skewness is computed at the truth locations.

As shown in the figure, thin ice (SIT $\leq 25$ cm) yields noticeable skewness in the posterior estimates for all schemes. In the first bin, the truth is close to zero meters (open water) and hence all instances where thin ice has melted in the assimilation run count as zero value. On the other hand, freezing instances lead to various thickness values above $25$ cm. Both effect together can make the distribution skewed. The bin between zero and $10$ cm shows even larger skewness and has been removed for a better visual presentation. Other than the first bin, a small negative skewness is observed for all the schemes. One possible explanation is the fast melting of ice, drifting over warm waters; a situation enhanced by the lack of coupling with the ocean in the assimilation. This result confirms that the EnKF-SQ, although it uses a skew 2-piece Gaussian likelihood, does not introduce any noticeable positive skewness in its posterior.

4.4 Physical Consistency

Ice-ocean models are essential tools for computing integrated quantities that are often difficult to estimate from observations only. Sea ice volume and water transport between ocean basins are such high interest quantities for climate studies. Therefore, it is important to evaluate these quantities to verify that the use of data assimilation does not cause physical inconsistencies.

The total sea ice volume is the integral of sea ice concentration times the sea ice thickness over the entire model area. Its evolution for the different assimilation runs is shown in Figure 11a. The difference between the assimilation runs compared to the true sea ice volume (Figure 11b) is relatively small. This is because none of the DA schemes has extensively added or removed ice during the assimilation run. In Figure 11b a classical seesaw Kalman update behavior is observed. The comparison also reveals that most methods tend to underestimate the ice volume except for EnKF-CLIM.

As described earlier, the EnKF-IG has a negative SIT bias, which translates to a nominal loss of between $300$ km3 to $500$ km3 of sea ice volume from the beginning to the end of the winter (less than $3\%$ of the total simulated ice volume). Seesaw of the time series curves confirm that the EnKF-IG update does remove some ice, which grows back during the subsequent TOPAZ4 model run. The EnKF-SQ does only partially mitigate this loss by $100$ to $200$ km3 of ice. Surprisingly, the EnKF-ALL is not bias-free either with a loss of up to $100$ km3 of ice, which can be caused by various sub-optimal aspects of the data assimilation system, in particular the aforementioned effect of wind perturbations on the areas of thickest ice and the weakly coupled DA. These effects also contribute to the low bias in the other two methods.

The EnKF-CLIM ice volume is closest to the truth run with a little overestimation in the beginning of the winter, then an underestimation in the spring. The construction of the climatological data can explain this trend: since SIT data above one meter only have been retained, the climatology overestimates the SIT in the beginning of the winter but then underestimates the SIT in the midst of the winter because it also accounts for summer SIT. A different construction of the SIT climatology data would have led to a different tendency in EnKF-CLIM.

5 Discussion and Conclusions

The purpose of this paper is to demonstrate the usefulness of assimilating range-limited observations with the new EnKF-SQ DA scheme under a realistic experimental setup. Compared to the stochastic EnKF, the main algorithmic difference is the need to compute a different Kalman gain for each ensemble member, depending on the location of the member to the threshold when the observation is out-of-range. This does not make the EnKF-SQ less efficient, but rather prevents the algorithm from being included as a simple extension of existing EnKF codes: it cannot be expressed with an ensemble transform matrix.

The assimilation of synthetic sea ice thickness data with a upper detection limit of $1$ m in a coupled ice-ocean model of TOPAZ4 is demonstrated using the EnKF-SQ and shown to have a useful impact on SIT estimates. The results obtained with SMOS-like observations can be generalized to CryoSat-2 like observations by reversing the upper limit into a lower limit. Thus, merging the two products may not be necessary because each satellite data can be assimilated in a separate EnKF-SQ step.

Different assimilation experiments are conducted to assess the performance of the EnKF-SQ against other EnKF configurations assimilating only thin ice; both thin and thick ice; and climatology during a winter period in the Arctic. The study shows that assimilating soft data improves the forecast accuracy compared to ignoring them by approximately $8$ %, particularly where sea ice approaches the detection limit. Such a difference can be important in the performance of an operational system.

The performance exhibited by assimilating a reasonably accurate climatology was similar to the EnKF-SQ. Also, our choice of climatology being annual rather than seasonal may explain some of the flaws in the EnKF-CLIM. Nonetheless, the context of twin experiments is very favorable to EnKF-CLIM because the climatological truth is perfectly known; a case which is not true in realistic situations. For instance, in summer there are very few ice thickness measurements and thus it is difficult to construct a meaningful climatology. To this end, it is essential to investigate and compare the performance of the EnKF-SQ and EnKF-CLIM in a context of a biased model twin experiment and with a range of toy models (from linear to non linear regimes).

Assessing the bias of the analysis showed that there is no introduction of any significant bias by the EnKF-SQ, other than the negative bias for thicker ice which is observed in all tested DA schemes. Likewise, the posterior distributions resulting from the application of the EnKF-SQ did not consist of any noticeable higher order moments that could result in undesirable non-Gaussian features because of the two-piece Gaussian likelihood. This is most likely the case for all realistic applications where one would expect relatively small assimilation updates coming regularly in time. Furthermore, the choice of out-of-range (OR) observation error variance was not found to be very critical. A wide range of values for this parameter were tested and lead to acceptable performance of the EnKF-SQ. Ways of estimating $\sigma_{or}^{2}$ adaptively in space and time is currently being investigated and will be reported in a follow-up study. Concerning the physical constraints of the model, the EnKF-SQ estimates were found to be physically consistent and comparable to other tested assimilation schemes.

The EnKF-SQ therefore makes a viable data assimilation strategy for range-limited observations in high-dimensional nonlinear systems. Future research will focus on assimilating real data, in which the EnKF-SQ is confronted with large observation biases unlike the presented twin experiments setup.

Acknowledgements

The research is funded by the Nordic Center of Excellence EmblA (Ensemble-based data assimilation for environmental monitoring and prediction) under NordForsk contract number 56801. The Copernicus Marine Services and the Nansen Scientific Society have also contributed to the funding. Norwegian grants of the computer time (nn2993k) and data storage space (ns2993k) have also been used for the simulations.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aagaard and Carmack (1989) Aagaard K, Carmack EC. 1989. The role of sea ice and other fresh water in the Arctic circulation. Journal of Geophysical Research: Oceans 94 (C 10): 14 485–14 498, doi: 10.1029/JC 094i C 10p 14485 , URL https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/JC 094i C 10p 14485 .
2Bentsen et al. (1999) Bentsen M, Evensen G, Drange H, Jenkins AD. 1999. Coordinate transformation on a sphere using conformal mapping. Monthly Weather Review 127 (12): 2733–2740, doi: 10.1175/1520-0493(1999)127¡2733:CTOASU¿2.0.CO;2 , URL https://doi.org/10.1175/1520-0493(1999)127<2733:CTOASU>2.0.CO;2 . · doi ↗
3Bleck (2002) Bleck R. 2002. An oceanic general circulation model framed in hybrid isopycnic-cartesian coordinates. Ocean modelling 4 (1): 55–88.
4Borup et al. (2015) Borup M, Grum M, Madsen H, Mikkelsen PS. 2015. A partial ensemble Kalman filtering approach to enable use of range limited observations. Stochastic Environmental Research and Risk Assessment 29 (1): 119–129, doi: 10.1007/s 00477-014-0908-1 , URL http://dx.doi.org/10.1007/s 00477-014-0908-1 . · doi ↗
5Bouillon et al. (2013) Bouillon S, Fichefet T, Legat V, Madec G. 2013. The elastic–viscous–plastic method revisited. Ocean Modelling 71 : 2 – 12, doi: https://doi.org/10.1016/j.ocemod.2013.05.013 , URL http://www.sciencedirect.com/science/article/pii/S 146350031300098 X . Arctic Ocean.
6Burgers et al. (1998) Burgers G, Jan van Leeuwen P, Evensen G. 1998. Analysis scheme in the ensemble Kalman filter. Monthly weather review 126 (6): 1719–1724.
7Chassignet et al. (2003) Chassignet EP, Smith LT, Halliwell GR, Bleck R. 2003. North Atlantic simulations with the hybrid coordinate ocean model (HYCOM): Impact of the vertical coordinate choice, reference pressure, and thermobaricity. Journal of Physical Oceanography 33 (12): 2504–2526.
8Collow et al. (2015) Collow TW, Wang W, Kumar A, Zhang J. 2015. Improving Arctic sea ice prediction using PIOMAS initial sea ice thickness in a coupled Ocean-Atmosphere model. Monthly Weather Review 143 (11): 4618–4630, doi: 10.1175/MWR-D-15-0097.1 , URL https://doi.org/10.1175/MWR-D-15-0097.1 . · doi ↗