Data assimilation (DA) methods have been used recently to constrain the climate model forecasts by paleo-proxy records. Both DA and climate models are computationally very expensive. Moreover, in paleo-DA, the time step of consequence for observations is usually too long for a dynamical model to follow the previous analysis state and the chaotic behavior of the model becomes dominant. The majority of recent paleoclimate studies using DA have performed low- or intermediate-resolution global simulations along with an “off-line” DA approach. In an off-line DA, the re-initialization cycle is completely removed after the assimilation step. In this paper, we design a computationally affordable DA to assimilate yearly pseudo-observations and real observations into an ensemble of COSMO-CLM high-resolution regional climate model (RCM) simulations over Europe, for which the ensemble members slightly differ in boundary and initial conditions. Within a perfect model experiment, the performance of the applied DA scheme is evaluated with respect to its sensitivity to the noise levels of pseudo-observations. It was observed that the injected bias in the pseudo-observations linearly impacts the DA skill. Such experiments can serve as a tool for the selection of proxy records, which can potentially reduce the state estimation error when they are assimilated. Additionally, the sensitivity of COSMO-CLM to the boundary conditions is addressed. The geographical regions where the model exhibits high internal variability are identified. Two sets of experiments are conducted by averaging the observations over summer and winter. Furthermore, the effect of the spurious correlations within the observation space is studied and a optimal correlation radius, within which the observations are assumed to be correlated, is detected. Finally, the pollen-based reconstructed quantities at the mid-Holocene are assimilated into the RCM and the performance is evaluated against a test dataset. We conclude that the DA approach is a promising tool for creating high-resolution yearly analysis quantities. The affordable DA method can be applied to efficiently improve climate field reconstruction efforts by combining high-resolution paleoclimate simulations and the available proxy records.

It is now well known that many of the long-term processes (millennial scale)
in the climate system have a large impact on predicting the climate, even for
shorter timescales (decadal and yearly)

Climate proxy archives (tree ring, coral, sediment, and glacial) are examples
of indirect climate observations that suffer from several structural ill
conditions

Climate models may serve as an alternative method for the investigation of
long-term paleoclimate variability. They create dynamically consistent
climate states by using numerical methods

In addition to the two abovementioned methodologies, a novel and appealing
technique for the reconstruction of the climate of the past is data
assimilation (DA). According to

What is the geographic distribution of model bias in an ensemble of regional climate simulations (members differ slightly in boundary and initial conditions)?

What is the optimum radius within which the observations are correlated?

Can this particular off-line DA approach be used to constrain regional climate simulations in time and space?

What is the range of observation errors within which the observations reduce the model's bias via DA?

This paper is structured as follows. Section 2 starts by introducing the off-line DA basics and the experiment design. Furthermore, the concept of the perfect model experiment and the metrics, which measure the performance of this method, are described. In Sect. 3 we present our results of constrained model simulations. We discuss and summarize our work in Sect. 4.

Prior to describing the experimental design, we give a brief review of the
optimal interpolation (OI) method (for the full review see

Other hypotheses are that the information about the observation and
background errors are known (

The OI scheme is considered as the best linear unbiased estimator (BLUE) of
the nature state

It is linear for

It is not biased:

It has the lowest error variance (optimal error variance).

The unbiased linear equation between

Thus, the error covariance of the analysis will be

The trace of matrix

Given that the total error variance of the analysis has its minimum value, a
small

Assuming that

The calculation of the covariance matrices for RCM is very expensive.
Therefore, an ensemble of the model states is applied to approximate the mean
and covariance of the forecast

Models contain systematic errors that may have diverse origins (dynamical
core, parameterization, and initialization). DA schemes are also based on
simplified hypotheses and are imperfect (e.g., here the Gaussian
parameterization for

RCM simulations tend to follow the trajectory of the driving GCM; however, it
is known that RCMs deviate from the driving GCM, both on smaller scales that
are not resolved by the GCM and on larger scales that are resolved by the GCM

RCM domains: the thin black box shows the nature and the dashed
boxes show the shifted members (only two are shown for the northwest and
southeast with five grid points of distance from nature). The red pluses show
500 random meteorological stations of the “ENSEMBLES daily gridded
observational dataset for precipitation, temperature and sea level pressure
in Europe called E-OBS”

The CCLM model version cosmo_131108_5.00_clm8

The skill of the analysis (

10-year ensemble spread and RMSE for the seasonal mean of 2 m temperature (K).

To maintain clarity we focus only on the temperature at 2 m (T2M) as a variable. For evaluation, the relaxation zone where the boundary data are relaxed in the RCM (here 20 grid points next to the lateral boundaries) is removed. The sea surface temperature in our CCLM simulation is interpolated from the driving model (here ERA-Interim) and not calculated by the model dynamics; the spread of the ensemble for T2M is zero over oceans. Therefore the RMSEs over oceans are masked out from the analysis and only values over land are shown.

Figure

Prior to conducting the analysis calculation, we searched for an optimal
correlation length (

Field mean of RMSE for near-surface temperature (K) analysis over
the evaluation domain for different correlation lengths (

Field mean of seasonal near-surface temperature RMSE (K) of the analysis vs. SNR.

Figures

36-year ensemble spread and RMSE for the seasonal mean of 2 m temperature (K).

As defined by the World Meteorological Organization (WMO) the climate is
explained by averaging the weather state for a period of at least 30 years.
Therefore, we conducted a new set of five 36-year-long simulations (one
nature and four shifted runs). The computational cost of RCM was the only
limiting factor to choose this number of members (

Following the methodology of

Yearly field mean RMSE (K) of the ensemble mean (white line) and
analysis (black line) for winter

A test with the real data will shed light on the applied DA efficiency.
Therefore, we design additional experiments using real proxies and
precomputed COSMO-CLM simulations at the mid-Holocene. Here, we briefly
explain the method and present the results for a single summertime (JJA) time
slice of 6000 years before present (6KBP). We use the pollen-based
temperature reconstructions of

Averaging with respect to their distance to the target year
(6KBP): a time window centered on the target year (e.g., reference time

Averaging with respect to their uncertainties provided as
standard error by

Then the weighted mean is calculated by

Schematic showing the weights for observations with respect to their distance to the target year. A time window of 1000 years is chosen. The observations are weighted depending on their distances to the reference time.

Finally, for the model, the 25-year time average is assigned as the expected
value and the standard deviation from the mean as the uncertainty measure.
Figure

Schematic showing the weights for observations with respect to their standard error. A time window of 1000 years is chosen. The red dots represent the proxies in the 1000-year time window and the green dot represents the weighted mean.

Schematic showing how the expected value (the mean) and the deviation from the mean for the time slice simulation are selected. The model simulation is 25 years long. The green line represents the model state.

DA results of T2M during summer at 6KBP using weighted arithmetic
mean by time distances:

We have to mention that the analysis presented here is based on the
combination of the proxy reconstructions and the climate model, and any
interpretation of the patterns should take into account the uncertainties
provided by these two sources of information. Several factors have driven our
decision to apply the proposed DA method to this particular paleoclimate case
study. Among these, the most important one was the attempt to contribute to
the reconstruction of summer temperatures over the Mediterranean region at
the mid-Holocene. This has been the subject of a long-standing debate within
the paleoclimate community: on the one hand, climate model simulations

Weighted arithmetic mean using time distances: anomalies
(6KBP–0.2KBP) of assimilated observations (circles) superposed on their
standard errors (squares) with values in K. Color bar of the standard errors
as in Fig.

DA results of T2M during summer at 6KBP using weighted arithmetic
mean by standard errors:

Weighted arithmetic mean using standard errors: anomalies
(6KBP–0.2KBP) of assimilated observations (circles) superposed on their
standard errors (squares) with values in K. Color bar of the standard errors
as in Fig.

This dualism seemed to be finally solved in a recent study by

Using a computationally fast DA approach, we assimilated pseudo-observations
and real observations within an ensemble of precomputed RCM simulations. The
ensemble is created by slightly perturbing the boundary and initial
conditions of its members via the domain-shifting method. In the framework of
a perfect model experiment the performance of free ensemble and analysis
quantities is evaluated. Such experiments facilitate the estimation of
observation, background, and analysis error. In the first set of experiments,
we conducted an ensemble of 20 simulations driven by ERA-Interim for the
duration of 10 years. The nearest-neighbor interpolation is applied as the
observation operator plus a random white noise with known standard deviation
to create a set of pseudo-observations from the nature run.
Pseudo-observations are assimilated within the ensemble of RCM runs by means
of the OI approach. By conducting a set of simulations using four perturbed
members and a nature run, we repeated the perfect model experiment for a time
expansion of 36 years. This allowed us to draw conclusions on the time
evolution of the DA skill for a typical climatological period (more than
30 years). In a final step, we assimilated pollen-based temperature
reconstructions of the mid-Holocene precomputed RCM simulation with a 25-year
duration at 0.44

The comparison of ensemble mean of COSMO-CLM model outputs and the
pseudo-observations shows that the model seems to be well tuned for central
Europe. A region of significant model bias for both the winter and summer
seasons is located over the east side of the domain. This area is located far
from the ocean where the ERA-Interim data are prescribed (no coupled ocean
was implemented). Therefore, we speculate that the model generates more
variabilities and is free to evolve over this region (answer to question (i)
in the Introduction). Furthermore, we iterated the DA experiment on different
values of correlation length for the summer and winter to find the optimal
correlation length quantity. The optimum radius of correlation is found to be
1.7

Our experiments showed that the ensemble OI is useful for conducting an
analysis of seasonally averaged quantities (answer to question (iii) in the
Introduction). Despite the inhomogeneity of the observation distribution over
the domain, the analysis presents error reduction over most of the domain.
For a small ensemble with a longer integration period of 36 years, the
analysis significantly outperforms the ensemble mean. However, for the winter
season, the analysis error increases with time and it consists of the same
rising trend as in the error of the ensemble mean. For the summertime the
trend was removed in the analysis. This was previously observed in the study
of

In this paper inverse model outputs (temperature reconstructions) are exerted
directly. In a real-world experiment, it is recommended to use proxy system
models (PSMs) to remove the simplistic assumptions of inverse climate
modeling and let the assimilation take place at the observation space instead
of the model space

A major drawback of our experiment is the linearity assumption of the forecast model and the Gaussianity of the observation and model errors. In OI the background error covariance is usually prescribed and calculated once during the entire assimilation procedure. Our experiment showed that although the spread of the ensemble increases slightly in time, each individual member and the ensemble mean show a similar trend. This similar behavior of the members might be due to the systematic behavior of the CCLM. We suggest a multi-model ensemble approach to account for a wider range of internal variabilities. However, conducting such experiments is prohibitively expensive with today's computational powers.

The experiment code and its full description are
available upon the request to the corresponding author. The OI code and its
description is publicly available at

BF and WA designed the DA experiments. BF and ER have conducted the RCM simulations. All authors analyzed the results, discussed the outcome, and drafted the paper.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Paleoclimate data synthesis and analysis of associated uncertainty (BG/CP/ESSD inter-journal SI)”. It is not associated with a conference.

The authors gratefully acknowledge the German Federal Ministry of Education and Research (BMBF) as a Research for Sustainability initiative (FONA) through the PalMod project with the general goal of analyzing a complete glacial cycle, from the Last Interglacial to the Anthropocene. The computational resources were made available by the German Climate Computing Center (DKRZ). We acknowledge the GeoHydrodynamics and Environment Research (GHER) research group for making their OI code available to the public. Our special thanks go to Ingo Kirchner for his helpful comments on simulation setup and to the CCLM community. Edited by: Christian Ohlwein Reviewed by: three anonymous referees