Paleoclimate data assimilation (DA) is a promising technique to systematically combine the information from climate model simulations and proxy records. Here, we investigate the assimilation of tree-ring-width (TRW) chronologies into an atmospheric global climate model using ensemble Kalman filter (EnKF) techniques and a process-based tree-growth forward model as an observation operator. Our results, within a perfect-model experiment setting, indicate that the “online DA” approach did not outperform the “off-line” one, despite its considerable additional implementation complexity. On the other hand, it was observed that the nonlinear response of tree growth to surface temperature and soil moisture does deteriorate the operation of the time-averaged EnKF methodology. Moreover, for the first time we show that this skill loss appears significantly sensitive to the structure of the growth rate function, used to represent the principle of limiting factors (PLF) within the forward model. In general, our experiments showed that the error reduction achieved by assimilating pseudo-TRW chronologies is modulated by the magnitude of the yearly internal variability in the model. This result might help the dendrochronology community to optimize their sampling efforts.

The low-frequency temporal variability in the
climate system cannot be estimated from the available time span of
instrumental climate records. Accordingly, paleoclimate reconstruction must
necessarily rely on the use of the paleoclimate proxy records. These natural
archives exhibit several problematic features, e.g., low time resolution,
sparse and irregular spatial distribution, complex nonlinear response to
climate, and high noise levels. Therefore the proper extraction of the
climate signal contained therein can often remain opaque

So far, several very diverse paleo-DA schemes have been investigated,
including pattern nudging

An important difference between paleo-DA and traditional meteorological DA is
that the assimilation period might be very long compared to the timescales of
the dynamical model. Under these conditions, the randomizing action of the
chaotic model dynamics becomes dominant and consequently the forecast appears
completely de-correlated from the previous analysis state. This phenomenon,
currently referred to as an “off-line regime”, has been observed in several
paleo-DA studies

A typical assumption in most of the paleo-DA studies so far conducted is that
the climate–proxy relation is linear. Nonetheless, currently it is widely
recognized that climate proxies are the result of complex recording
processes, which can be of a physical, chemical and biological nature. More
realistic methodologies have been recently sculpted by the paleoclimate
community in order to investigate the climate–proxy relationship, considering
the distinct processes whereby the climate signal is recorded in proxy
archives. Proxy forward modeling

Several recent studies have investigated the applicability of process-based
forward models in a paleo-DA setting

This paper follows the rationale of AC15 but within a more realistic
scenario, where an AGCM is used as a dynamical system and the observational
network resembles the currently available TRW chronologies. The purpose of
this study is then to contribute to the present knowledge of paleo-DA
techniques by addressing the following two questions:

Does the off-line regime naturally appear for the assimilation of TRW records into an AGCM?

Is the FL-based extension of the VSL model still useful to improve the performance of a time-averaged EnKF technique when a climate model is used?

This study is structured as follows. In Sect.

In this paper, the term DA designates the process of estimating the state of a
system using observations and the physical laws governing the evolution of
the system as represented in a numerical model

Among the currently available DA techniques, EnKF

Within the Kalman filter (KF)

The observations

For realistic geophysical models, the dimensionality of the model state can
be very high and then the calculation and storage of the covariance matrices
can be prohibitively expensive. A solution to this problem is provided by the
EnKF

Following this approach, the mean and covariance of the forecast take the following form:

In the stochastic approach an observational ensemble

A practical problem of EnKF schemes is that due to the limited ensemble
size, the forecast uncertainty is usually underestimated. This leads to an
excessive confidence in the forecast, and after several assimilation cycles
the observations may be completely ignored. This situation is normally
avoided by means of an ad hoc procedure known as “covariance inflation”,
where the forecast covariance matrix is multiplied by a constant greater than 1. Another undesired consequence of the limited ensemble size is that the
ensemble state at any grid point will present non-negligible spurious
correlations with observations located far apart in space. This difficulty is
solved using another ad hoc procedure known as “covariance localization”.
Here, we utilize the R localization

The EnKF algorithm was initially designed to estimate the instantaneous state
of a model given instantaneous observations. As a consequence, EnKF cannot be
directly applied to paleoclimate data given that the observational
information present in proxy records is typically the average of a function
of the state over long time periods. A solution to this conflict is provided
by the time-averaged EnKF

The VSL model for TRW chronologies offers an intermediate-complexity approach
between ecophysiological and completely data-driven models

Regarding DA, VSL presents two challenging nonlinear aspects:
(i) a “thresholded response”

The term fuzzy logic was coined by

Equation (

An important aspect of the product growth response VSL-Prod is the presence
of an additional growth limitation regime where T and M concurrently limit
tree-ring growth. This “co-limitation” regime allows a gradual transition
between temperature- and moisture-limited growth limitation regimes and
accordingly a progressive alternation of the recorded variable. Growth
co-limitation was initially recognized by

Following the rationale used in the experiments of AC15, we conducted a set
of DA experiments using the Simplified Parameterizations, primitivE-Equation DYnamics (SPEEDY) model

The SPEEDY model

The SPEEDY model was embedded by

For the experiments presented in this paper, we employed ensembles of 24
members due to computational constraints. We used a constant multiplicative
inflation of 1 % after the ensemble update and R localization via the
following formula:

Schematic of a typical observation system simulation experiment
(OSSE) with ensemble online (with cycling) and
off-line (no-cycling) DA
methods.

We performed a set of Observation System Simulation Experiments (OSSEs) (see
Fig.

Initially, a 1-year-long spin-up run is performed starting from 1 January 1860. Afterwards, the final state of this model trajectory is used as the initial condition for a 150-year-long nature (“true”) run. The ensemble runs with and without DA are identically initialized from a set of states gathered from the last 2 months of the spin-up run (lagged 2-day initialization). Note that the nature run and the different ensemble runs are generated with the same time-varying forcing fields. Regarding the atmosphere–ocean coupling, we used SPEEDY's slab ocean configuration, motivated by the fact that the slow variability in the slab ocean may lend predictability to the atmosphere. In these conditions, the online DA technique should have higher chances to outperform the off-line technique.

Pseudo-TRW observations are produced following VSL's formulation plus a
final white noise addition step, in which random draws from a Gaussian
distribution are imposed on the time-averaged observations. Noise levels are
assessed by means of the signal-to-noise ratio (SNR), given by the ratio of
the standard deviation of the unpolluted pseudo-TRW observations to the
standard deviation of the additive white noise. Most of the results reported
in Sect.

Station set resembling real TRW network from

Concerning the configuration of the observation operator, we focus our study
on the role of the growth rate function by configuring VSL in such a way that
no thresholded response takes place. This is done by setting the upper and
lower response thresholds to the maximum (minimum) values during the nature
(true) run so that the response functions reduce to linear rescaling
operators. We consider three different growth rate functions leading to three
VSL configurations: (i) VSL-T, where the growth rate is directly given by the
growth response to temperature (

Finally, respecting the soil moisture fields used to drive the VSL model, we
consider two options: (i) extracting soil moisture time series from the
climatological surface boundary conditions of the SPEEDY model and (ii) using
the precipitation and temperature output of SPEEDY as input for the leaky
bucket model (LBM)

The LBM code was extracted from VSL v2_3
(

Thanks to the availability of the truth model evolution for our OSSEs, the
forecast and analysis skill of the ensemble runs can be directly assessed.
Given the annual resolution of TRW chronologies, we study the filter
performance for yearly averaged values of near-surface temperatures. We focus
our analysis on near-surface temperature due to the larger error reduction in
this field as compared to other variables (e.g., humidity,

An AGCM is an example of a nonautonomous system, and accordingly the
evolution of its state is determined by both the atmospheric dynamics and the
external forcing. The influences of these two distinct factors can be
disentangled to some extent by considering atmospheric variability to be a
superposition of an internal component, caused by the intrinsic dynamics, and
an external one, resulting from the variations in the boundary conditions

In order to focus on the comparison between online and off-line DA techniques, we first consider the assimilation of temperature-limited pseudo-chronologies produced with VSL-T. Note that this observation operator generates linear univariate time-averaged observations, and accordingly the time-averaged EnKF must be in good conditions to operate, given that no nonlinearities are present in the observation operator.

Figure

Yearly near-surface temperature spread

Yearly near-surface temperature RMSE for the ensemble run
constrained by VSL-T pseudo-TRW observations.

Global near-surface temperature RMSE for the forecast ensemble run
constrained by VSL-T pseudo-TRW observation (red) and the free ensemble run
(black) for the analysis of online (green) and off-line (blue) DA. Panel

Yearly near-surface temperature RMSE for the analysis of the
ensemble runs using off-line DA
and climatological soil moisture.

Global near-surface temperature RMSE for the off-line analysis of
the ensemble runs constrained by VSL-Min (red), VSL-Prod (green) and VSL-T
(blue) pseudo-TRW observations. Climatological soil moisture is used to drive
the VSL model. Horizontal lines represent the mean values. Panel

Global yearly near-surface temperature RMSE box plots for the free
ensemble run forecast (

Averaged global yearly near-surface temperature RMSE for the analysis of the ensemble run with off-line DA, VSL-Min observation operator and different signal-to-noise ratios. The green star shows the corresponding value for the free run.

On the other hand, online forecast quantities do not present significant
error reductions, and consequently the online time-averaged EnKF appears to
work under the off-line regime. This situation can be seen in
Fig.

Here we analyze the role of the structure of the growth rate function in the
performance of the off-line DA scheme. Given that both

Regarding the dependency on the representation of the PLF, Fig.

Finally, respecting the dependency of the filter performance on the
observational noise levels, Fig.

Using the time-averaged EnKF methodology and a process-based proxy forward model (VSL), we assimilated pseudo-TRW chronologies in an AGCM (SPEEDY). Using a set of perfect model experiments we studied two different aspects of the paleo-DA problem: (i) the onset of the off-line regime in the assimilation of observations averaged during long time periods and (ii) the impact of a nonlinear observation operator on the performance of EnKF-based time-averaged DA approaches.

Our online DA experiments in general showed no forecasting skill, and accordingly they appear to operate under the off-line regime. Moreover, they
exhibited a detrimental increasing error trend not present for our
experiments with off-line DA schemes, where no re-initialization of the model
was performed. In these conditions, the off-line time-averaged EnKF appears
to outperform its online counterpart. This result complements the studies of

Concerning the influence of nonlinearities in the observation operator, the
performance of the off-line time-averaged EnKF appeared to be significantly
sensitive to the selection of the t-norm used to represent the PLF. In our
experiments, the product t-norm outperformed the original minimum t-norm, as
previously observed for a two-scale

This adjective is currently used in information technology to designate data encoding methods that lead to information loss from the original version for the sake of reducing the amount of data needed to store the content.

recorders of climate, due to the integrated nature of the information contained in them and the standardization process used to minimize the non-climatic effects on growth. In the same vein, we argue that the “abrupt shifting” of the recorded variable (temperature or moisture) – implied by the minimum function used in VSL's original formulation – might constitute an additional source of lossiness, which can be reduced by resorting to an FL-based representation of the PLF. In particular, for the product t-norm, the existence of an additional co-limitation regime makes a smoother shifting of the recorded variable possible. As a cautionary remark, we want to highlight that the pseudo-observations assimilated in our experiments present several important limitations: (i) the thresholded response of trees to temperature and moisture was not considered in order to focus on the role of the growth rate function; (ii) VSL's parameters were set in a completely homogeneous fashion for all the observational stations, whereas actual TRW networks are strongly heterogeneous, comprising chronologies generated under highly dissimilar growth limitation regimes. More realistic TRW assimilation experiments will probably have to address these issues as well as the necessity of considering model errors by conducting imperfect model OSSEs.Finally, we want to mention that the translation of VSL into the FL language
suggests other possible extensions for VSL. (i) Growth response functions can
be generalized using the extensive knowledge on membership functions gathered
in the FL research community

No data sets were used in this article.

The authors declare that they have no conflict of interest.

This work was supported by the German Federal Ministry of Education and Research
(BMBF) as part of the Research for Sustainable Development initiative (FONA;