A novel approach to climate reconstructions using Ensemble Kalman Filtering

Introduction Conclusions References


Introduction
Compared to conventional reconstruction methods, data assimilation represents a novel approach to increase our understanding of past climate.In this paper, we explore in an idealised setup if assimilation of sparse and indirect observations of past climate states as recorded in climate proxies provides sufficient constraints to skilfully update existing model simulations.
Two distinct approaches have often been used when reconstructing past climate: Empirical methods relate the changes in climate proxies -such as tree ring widths or δ 18 O concentrations in ice cores -to changes in climate variables during past decades (see Jansen et al., 2007 andJones et al., 2009 for an overview of recent advances).This relationship is then extended backwards, allowing for the reconstruction of said climate variables for times when no direct observations of the climate system Figures are available.Empirical methods rely on the stationarity of the relationship between climate and proxy record.In addition, the specifics of high-resolution proxy archives make it hard to quantify low frequency variability (Moberg et al., 2005).Dynamical methods, on the other hand, use reconstructed external forcings (e.g.changes in solar irradiance, land cover, atmospheric aerosol and greenhouse gas concentrations) to constrain simulations of past climate states (e.g.Jungclaus et al., 2010;Wanner et al., 2008;Ammann et al., 2007).In contrast to empirical approaches, dynamical methods allow us to also reconstruct climate variables which are only loosely correlated to climate proxies.Ensembles of climate model simulations, however, are often not well constrained, as a large part of the variability is generated in the climate system itself and is thus independent of external forcings.
To overcome the relative weaknesses of these two approaches, it has been proposed to directly assimilate proxy data into climate model simulations (Goosse et al., 2009;Hughes and Ammann, 2009;Widmann et al., 2010).Data assimilation in a paleoclimatology context proceeds as follows: The climate model simulations are used to learn about the distribution of climate states consistent with model physics (representing our understanding of the system) and external forcings.In each analysis cycle, the model simulations are then updated with all available observations.These updated simulations are referred to as the analysis and the update procedure ensures that the analysis is both consistent with the assimilated observations and with model physics and boundary conditions.
First attempts to assimilate climate proxy information into models include the pioneering work of von Storch et al. (2000), Hargreaves and Annan (2002), van der Schrier and Barkmeijer (2005), Goosse et al. (2006), and Franke et al. (2010).The proposed approaches can be roughly separated in three groups: The methods of von Storch et al. and van der Schrier and Barkmeijer seek to push a model simulation towards a largescale target state through nudging (von Storch et al., 2000) or using singular forcing vectors (van der Schrier and Barkmeijer, 2005).The methods by Goosse et al. (2006) and Franke et al. (2010) select optimal matches with the available proxy information Figures among a set of model states and combine these to "pseudo-simulations".All of the approaches discussed so far do not generically provide confidence intervals together with their best estimate.A shortcoming that is overcome by the approach proposed by Hargreaves and Annan.In contrast, their fully probabilistic approach is not tractable with a complex and computationally expensive model.Therefore, we propose a new approach that both allows us to assimilate proxy data into a high-resolution general circulation model (GCM) and that provides a generic quantification of the uncertainties.Data assimilation has long been used in numerical weather forecasting to estimate optimal initial conditions for weather predictions (Kalnay, 2003).The variational data assimilation techniques developed for weather forecasting, however, are not suitable for reconstruction of past climate with a much smaller number of observations or climate proxies.A much simpler to implement and computationally less expensive method to assimilate data into climate model simulations is represented by the class of square root filters.
We use the Ensemble Square Root Filter (EnSRF) -a variant of the Ensemble Kalman Filter (EnKF, see Evensen, 2003, and references therein) -as introduced by Whitaker and Hamill (2002) to update the ensemble of model simulations with information from climate proxies.The EnSRF has successfully been used to produce a reanalysis for the period from 1870 to present using sea-level pressure measurements (Compo et al., 2006(Compo et al., , 2011)).Here we investigate, whether EnSRF can also be used with spatially sparse observations with low temporal resolution.
Our main goal is to learn how to best assimilate climate proxy information into model simulations.In order to be able to experiment with the details of the setup and properly explore the potential of data assimilation for climate reconstructions at a reasonable computational cost, we want to be able to run the data assimilation off-line.Thus, we use an atmosphere-only GCM to provide a first guess of past climates.In this setup, the proxy information has a temporal resolution (semi-annual in our case) that is far greater than the deterministic predictability of most atmospheric processes (Lorenz, 1969;Kalnay, 2003).Therefore, we can assimilate the data off-line and we do not have Figures

Back Close
Full to feed back the corrected states as new initial conditions for the next simulation cycle.Ultimately, we aim at assimilating climate proxy data into a coupled atmosphere-ocean GCM.This study contributes to our understanding of the strengths and limitations of data assimilation using EnSRF for climate reconstructions.
In the following section, the model data and analysis scheme is introduced and we discuss the EnSRF algorithm in detail.In Sect.3, we present the results from the validation assessment of the analysis versus the unconstrained ensemble of model simulations.We discuss the strengths and limitations of data assimilation using EnSRF for climate reconstructions in the final section of the manuscript.

Model simulations
For the assessment of EnSRF for climate reconstructions, we use an initial condition ensemble of 30 simulations with the atmosphere-only model (GCM) ECHAM5.4(Roeckner et al., 2003(Roeckner et al., , 2004)).The model has been run in T63L31 resolution, corresponding to an approximate horizontal resolution of 1.875 • with 31 vertical levels from the surface to 10 hPa.We use a segment of 50 years from 1850 to 1899 of the 411 simulated years from 1600 to 2010.The model has been forced with reconstructed sea-surface temperatures (SST, reconstruction by Mann et al., 2009) augmented with ENSO-dependent intra-annual variability according to the reconstructed NINO3.4 index of Cook et al. (2008) and climatological sea-ice according to the HadISST climatology (Rayner et al., 2003).We further use reconstructed solar irradiance (Lean, 2000) and land surface parameters derived from the land-use reconstructions of Pongratz et al. (2008).Additionally, the model is forced with reconstructions of volcanic activity by Crowley et al. (2008)  The solar irradiance reconstruction by Lean (2000) exhibits an increase in irradiance of approximately 2.5 Wm −2 since the Maunder Minimum (MM).Recent reconstructions, however, show less of a change in solar irradiance between the MM and present conditions (Wang et al., 2005;Krivova et al., 2007).Nevertheless, we chose a strong solar forcing, as the recent study by Jungclaus et al. (2010) has shown that this leads to a slightly more realistic climate response over the past 1000 yr in ECHAM5.4.

Analysis scheme
We analyse simulated near-surface temperature and precipitation over land and several derived indices characterising atmospheric circulation according to Br önnimann et al. (2009).The data are aggregated for boreal winter (November to April) and summer (May to October), reflecting the approximate temporal resolution of climate proxies.In order to keep computations tractable, we thin out the initial model grid by selecting grid boxes only at every third longitude and latitude.The state vector used in the EnKF approach thus consists of semi-annual temperature and precipitation at 694 locations over land plus four derived indices.These indices include the strength of the northern subtropical jet (SJ), defined as the maximum zonal mean zonal wind at 200 hPa between the equator and 50 • N, the strength of the Hadley Cell (HC), defined as the maximum of the zonal mean meridional streamfunction at 500 hPa between the equator and 30 • N, the strength of the northern stratospheric polar vortex (z100), defined as the difference in geopotential height at 100 hPa between 75-90 • N and 40-55 updated by temperature time series at 37 different locations (see Fig. 1).The locations have been chosen to reflect the distribution of temperature sensitive proxies over land such as tree ring series and ice cores (e.g.Mann et al., 2009).Proxy networks such as collections of tree-ring series in North America and Europe are represented by a single pseudo-proxy.We analyse the potential of the data assimilation technique using perfect observations (the time series extracted from the reference simulation) and -in a more realistic framework -also using pseudo-proxies computed from the reference simulation.
We use a simple approach to fabricate pseudo-proxy time series: At the respective locations, we extract near-surface temperature time series from the reference simulation and disturb these with red noise generated by an AR(1) process with an autoregression coefficient of 0.7.The disturbance is further scaled to 1.5 standard deviations of the reference time series -thus resulting in correlations between 0.36 and 0.74 as shown in Fig. 1.The pseudo-proxies are slightly biased compared to the original series, with normally distributed biases centred at zero and ranging from −3.85 to 2.65 K (not shown).The bias in the pseudo-proxy time series reflects a potential estimation error when calibrating real-world proxy time series.Unlike in a real-world situation, however, the biases and red noise added to the reference time series have no spatial pattern and the variance of the disturbance is known exactly.

Ensemble Square Root Filtering
We use a variant of the Ensemble Kalman Filter (EnKF, see Evensen, 2003, and references therein) to update model simulations with measurements of the climate system -here pseudo-proxy time series derived from one model simulation.
In each analysis cycle, the background state, i.e. the climate model simulations, is updated with observations to produce the analysis.The analysis represents an optimal combination of the observations and the model simulations given observation error and the range of possible model states inferred from the ensemble.In the traditional EnKF, the observations are randomly perturbed to sample the observational error distribution.

Back Close
Full Consequently, EnKF is biased due to sampling uncertainty in both the background covariance P b estimated from the ensemble of model simulations and the observation perturbations.Due to the nonlinear dependence of the analysis covariance P a on the background covariance P b , P a will be biased low and therefore underestimate ensemble mean errors on average.This underestimate of P a can lead to filter divergence.As we are not assimilating data on-line, filter divergence is not an issue in this study, therefore, we do not deal with the problem of filter divergence explicitly.The perturbation of observations on the other hand, increases sampling error and leads to the analysiserror covariance estimate P a being less accurate on average.To overcome the above limitations, Whitaker and Hamill (2002) propose a novel approach that does not rely on the perturbation of observations; this approach is referred to as the Ensemble Square Root Filter (EnSRF).
Let the background state, x b , denote one simulation in the initial condition ensemble.
x b is a vector of length m = 1392.In the analysis step, the background states are updated with the observations y, a vector of size n = 37.Using EnSRF, the update can be separated in an ensemble mean update (Eq. 1) which is identical to the EnKF update and an update of the anomalies about the ensemble mean (Eq.2).Thus, we decompose the background state x b into the ensemble mean background state xb and the deviation from the ensemble mean x b and express the update equations as follows H, a matrix of size n × m, is the forward model that extracts the observations from the model state x.The Kalman gain matrix K (m × n) is identical to the gain matrix in the classical EnKF approach as shown in Eq. ( 3).The gain matrix for the ensemble anomalies, K, is expressed as follows Full is the m × m background error covariance matrix estimated from the ensemble of background states x b and R is the n × n observation error covariance matrix.The magnitudes of the observation errors are known and we assume that the observation errors are uncorrelated (R is diagonal).Therefore, we can update the ensemble serially, including one observation at a time.This greatly enhances the computational tractability of the problem.Due to the limited ensemble size, the background error covariance P b is subject to considerable sampling uncertainty.We deal with the problem of spurious covariances far off the diagonal in P b by deflating the off-diagonal elements of P b according to Eq. 5.

Metrics of skill
We analyse the skill in reconstructing different global and continental-scale indicators.Skill is measured using a mean squared error skill score (Murphy and Epstein, 1989) which is also known as the reduction of error (RE, Cook et al., 1994).
x a and x b denote the analysis and the unconstrained initial condition simulation respectively, x ref is the reference simulation (the target).The summation is over i and i counts the different time steps.This skill score ranges from 1 to −∞; positive values indicate that the analysis is closer to the reference simulation in mean square error terms than the unconstrained simulation.As we constrain the full set of simulations, we investigate both the skill for the ensemble mean and the individual simulations.In doing so, we compare the ensemble mean analysis xa with the unconstrained ensemble mean xb , and each individual analysis simulation with its unconstrained counterpart.
Furthermore, we also analyse the change in correlation from the correlation of the unconstrained simulations with the reference simulation to the correlation of the analysis with the reference simulation.

Results
First, we analyse the effect of the covariance localisation to deal with spurious covariances.Figure 2 illustrates the benefits of localisation of P b .Without localisation, skill -measured in terms of mean squared error of the ensemble mean compared with the reference time series (Murphy and Epstein, 1989) -is confined to the regions where proxy data are assimilated; elsewhere, we find negative skill.That is, without localisation, assimilation of perfect proxies leads to an "overcorrection" of the ensemble in regions far away from where information is assimilated (Fig. 2, a and b).With localisation, skill is less confined to the regions where we assimilate data (Fig. 2, c and d regions far away from proxy locations such as Africa or the Amazon Basin, the above mentioned overcorrection disappears resulting in zero skill.We can think of the initial-condition ensemble of ECHAM5.4 simulations and the analysis after data assimilation as hindcasts of past climate states.The spread of the ensemble -here expressed as the intra-ensemble standard deviation -indicates hindcasting uncertainty.In the case of the unconstrained hindcast, the spread represents the uncertainty due to internal variability.In the case of the analysis, we hope to make use of the information about the state of internal variability of the reference and thus we expect to reduce the hindcast uncertainty and thereby reduce ensemble spread.The influence of the data assimilation on the ensemble spread for temperature is shown in Fig. 3.We find that the uncertainty is significantly reduced in regions close to the assimilated information (e.g.Europe).As a consequence of the localisation, the spread is only marginally reduced in regions far from the assimilated information (e.g.subsaharan Africa).Furthermore, data assimilation leads to more wide-spread and larger reductions in spread in boreal winter.
In the following figures, the skill scores for the ensemble mean are displayed as arrowheads and the individual simulations as box plots (see Fig. 4).The boxes indicate the interquartile range of the 29 simulations in the analysis, the thick horizontal line indicates the median simulation, and the whiskers denote the range of the simulations.
In (arrowheads in Fig. 4, a and b).This is due to the fact that the unconstrained ensemble average is -due to its low variance and small bias -an a priori good guess for an additional simulation in mean square error terms.Correlation of the ensemble mean with the reference simulation, however, is generally greatly increased when information is assimilated (see Fig. 4, c and d).
We find positive skill for most indicators in boreal winter (Fig. 4a).Not surprisingly, skill is strongest in regions that are close to the assimilated information (e.g.northern European temperature and precipitation).However, we find positive skill also for the strength of the northern subtropical jet (SJ) and the stratospheric polar vortex (z100).Only for the intensity of the northern Hadley Cell (HC) we find negative skill for most simulations and the ensemble mean.In boreal summer, skill is generally reduced but still positive for most of the indicators shown in Fig. 4.
Correlation increases considerably with data assimilation for all indicators except the strength of the northern Hadley Cell (HC) in boreal winter (Fig. 4c).For northern European temperature over land (NEUt2m), correlation of most individual simulations (boxes) and the ensemble mean (arrowheads) increases from close to zero to above 0.5 after assimilation.As with skill, the benefits of data assimilation decrease slightly with increasing distance from the assimilated information.In boreal summer, in contrast, increases in correlation after data assimilation are much more moderate except for northern European temperature (Fig. 4d).
Finally, we investigate the effect of the ensemble size on data assimilation.In EnSRF, the model physics are represented through the error covariance matrix P b which is estimated directly from the ensemble.Thus, increasing ensemble size allows us to capture more details of the interrelation of variables and its spatial features.In addition, estimation errors decrease with increasing ensemble size.Computation of very large ensembles, however, is very costly and therefore we would like to learn about minimal requirements for climate reconstructions.Therefore, we run the EnSRF approach with randomly selected sets of 5, 10, 15, 20, 25, and 29 ensemble members and compare the results with the reference simulation.In order to reduce sampling issues, we repeat Figures the experiment 10 times for each ensemble size.
Mean square error skill increases with ensemble size for the various indicators shown in Fig. 5.This increase in skill is moderate for indicators close to the assimilated information such as mean temperature over land in the Northern Hemisphere or northern European total precipitation (Fig. 5, a, b, e, and f).In contrast, the increase in skill with increasing ensemble size is considerable for indicators with marginal skill such as the strength of the subtropical jet (SJ, Fig. 5, c and g) or the strength of the stratospheric polar vortex (z100, Fig. 5d).For these indicators, we find positive skill for most of the individual simulations only with ensembles of size 10 or more.We find simulations that perform well even with small ensembles, the positive effect of increasing ensemble size, however, is clearly visible in reducing the number of simulations with negative skill.

Discussion
This study illustrates the potential of data assimilation using EnSRF for paleoclimatology.Depending on the indicator of interest, we find considerable skill even when assimilating spatially sparse information with low temporal resolution.Positive skill is not only constrained to the climatic parameters that are assimilated, but it extends to other climatic variables as well.Furthermore, we find positive skill constraining upperair quantities such as the strength of the northern subtropical jet or the strength of the polar vortex through assimilation of surface quantities (here near-surface temperature).
Skill is generally confined to the Northern Hemisphere.This is a consequence of both the greater number of proxy records and the larger fractional land area in the Northern Hemisphere.As a consequence of the experimental setup (an atmosphereonly GCM), we do not expect large differences over oceans and adjacent land due to the dominant influence of sea-surface temperatures (SSTs) which are prescribed in the model simulations.We find strongest positive skill for variables in boreal winter, when weather in the northern midlatitudes is strongly influenced by large-scale Figures circulation.In boreal summer, when weather is much more dependent on local processes, data assimilation is less beneficial (see Fig. 4).This finding is in line with other studies (Br önnimann and Luterbacher, 2004;Rutherford et al., 2005;Franke et al., 2010;Griesser et al., 2010).
We assimilate semi-annual data and analyse skill both in summer and in winter.The extension of the methodology to be able to assimilate data with higher (monthly) or lower (annual to decadal) temporal resolution is straight-forward.Most temperaturesensitive climate proxies such as tree rings reflect summer temperatures, however, we assess skill for the winter half-year as well in order to explore the potential benefits of assimilating early instrumental observations and documentary evidence.
The skill metric presented here reflects value added to the initial condition ensemble by the data assimilation.The results are thus not comparable with previous studies making use of pseudo-proxies (Mann and Rutherford, 2002;von Storch et al., 2004;B ürger et al., 2006).In the following we highlight the most important difference between the study presented here and earlier work involving pseudo-proxies.The crucial element of empirical climate reconstructions is to establish the relationship between proxy records and certain climatic features (e.g.local climate or large-scale patterns) in the calibration period.Pseudo-proxy analyses have been used to investigate how well these relationships can be extrapolated to characterise past climates (see Rutherford et al., 2005;B ürger et al., 2006;Mann et al., 2007;Christiansen et al., 2009, for a discussion of different reconstruction methods).In the data assimilation framework, this proxy-climate relationship is characterised by the forward operator (proxy forward model) H and the observation error covariance R. As we are interested in quantifying the skill emerging from the assimilation of spatially sparse information with low temporal resolution, we do not touch on this issue.Instead, we focus on the differences between an unconstrained ensemble and the analysis after data assimilation.Nevertheless, we recognise that correct formulation of forward proxy models is crucial for real-world applications of the data assimilation procedure for climate reconstruction and we are currently working on this issue.Correlations between individual simulations and the reference simulation improve considerably after assimilation of pseudo-proxies (Fig. 4, c and d).This indicates that we can indeed use data assimilation to constrain internal variability.It is noteworthy, that positive correlations occur also in the unconstrained simulations (grey boxes and right-facing arrows).This is due to the deterministic response to changing boundary conditions or at random due to sampling issues related with the limited ensemble size.Further analysis reveals that of the indicators shown only NHt2m in both seasons and the DIMI in summer exhibit consistent variation across the ensemble (not shown).Therefore, we conclude that the deterministic response to varying boundary conditions seems to be much weaker than the fluctuations due to internal variability for most of the indicators.The dominance of internal variability in turn highlights the potential benefits for data assimilation approaches.
The only indicator for which we find clearly negative skill is the intensity of the northern Hadley Cell (HC) in boreal winter.This is due to a combination of reasons: First, variability in HC does not seem to be strongly linked to extratropical climate in ECHAM and the variability in HC is thus not well represented in the assimilated pseudo-proxies.Second, in contrast to near-surface climate quantities for individual grid boxes, we do not apply a localisation procedure for the derived indices (HC, SJ, z100, and DIMI).Thus, spurious correlation within the ensemble is fully exploited to update these series.This can lead to the issue of "overcorrection" as discussed above and thus to decreasing skill and/or decreasing correlation.Additional analyses reveal that skill for the HC in boreal winter can be both negative or positive, depending on which simulation is used as the reference simulation.With a similar localisation procedure as applied for the gridded variables, we find zero skill in the HC (not shown).This illustrates that localisation is crucial for successful proxy assimilation.Therefore, we recommend for future applications to use spatially explicit data with a localisation procedure in the analysis scheme, and to compute the integrated indicators after assimilation from the spatially explicit fields to avoid the above described issues.
We apply a fairly simple localisation procedure in this explorative study.The Figures

Back Close
Full localisation uses only horizontal distance to artificially reduce correlation and thus suppress the influence of spurious correlation arising from the small ensemble size used to estimate the correlation.This seems to work well for surface quantities (e.g.near-surface temperature and precipitation).Nevertheless, we cannot rule out the possibility that our localisation procedure suppresses real, far-reaching correlations (e.g.tele-connections) and that we thus unintentionally reduce skill in areas far away from the assimilated information.Given the issue of "overcorrection" without localisation (see Fig. 2 and HC in Fig. 4), we consider the potential reduction in skill due to overly restrictive localisation to be a conservative approach.Several authors developed adaptive approaches to allow for spatially and temporally more complex patterns of influence (see Anderson, 2007;Bishop and Hodyss, 2007;Fertig et al., 2007).While these adaptive approaches are potentially useful to overcome the problem described above, their implementation is much less straight-forward and beyond the scope of this study.Furthermore, we investigate the effect of ensemble size on our ability to successfully constrain the simulations with the available proxy information (see Fig. 5).We find the strongest positive effect of increasing ensemble size on simulations with no or negative skill.In addition, we note that in boreal summer we need larger ensemble sizes to satisfactorily represent regional climate.This is in line with earlier findings (Franke et al., 2010) noting the lower degrees of freedom of wintertime weather in the Northern Hemisphere.We conclude that while EnSRF with ensembles as small as 15 ensemble members leads to considerable skill in regions close to the assimilated information, larger ensembles are needed to reduce uncertainty in areas with little skill.
Finally, we would like to touch on more general limitations arising from the experimental setup.By using an atmosphere-only GCM, we restrict climate to closely follow reconstructed boundary conditions.These reconstructions, in turn, are themselves uncertain.It would thus be desirable to allow for uncertainties in the boundary conditions as well.We refrain from perturbing boundary conditions, as such an ensemble would not allow us to properly investigate the strengths and limitations of the data assimilation approach due to severe sampling issues.Instead, our experimental setup and the thus Figures resulting ensemble offers us the opportunity to develop our capabilities in assimilating proxy data (this study) and in formulating proxy forward models (on-going work) and to understand the respective impacts on our ability to reconstruct climate.The natural extension of our approach would be to assimilate data in a coupled Earth system model to better quantify our uncertainty about past climates.Such an experimental setup, however, requires on-line data assimilation, as the temporal limit for predictability of slowly varying parts of the Earth system such as the ocean or the land surface exceeds the temporal resolution of the assimilated information.While such a coupled Earth system model with data assimilation is our final goal, we again stress the importance of developing the capabilities required to setup and run such a model with a simpler and controllable experimental setup.

Conclusions
Data assimilation provides a third alternative to the traditional empirical methods for climate reconstructions and purely model based approaches (see Jansen et al., 2007, for a review of recent advances).We conclude that Ensemble Square Root Filtering (En-SRF) is a promising way to reconstruct past climates.Previously, the technique has been successfully applied in the twentieth century reanalysis project (Compo et al., 2011).Here, we show that data assimilation through EnSRF is beneficial even when assimilating much sparser information with low temporal resolution and with considerable measurement errors.This approach extends previous suggestions for data assimilation in paleoclimatology to a high-resolution GCM with data assimilation as used in weather forecasting applications.
The use of an ensemble of initial condition simulations allows us to express the uncertainty about past climate states in a natural way.Whereas intra-ensemble spread in the initial-condition ensemble indicates how well the past climate state is constrained by the boundary conditions, the change in spread from the unconstrained ensemble to the analysis can be used to assess the value added through the assimilation of Introduction

Conclusions References
Tables Figures

Back Close
Full observations.We assimilate temperature-sensitive pseudo-proxies with semi-annual resolution at 37 locations mainly in the Northern Hemisphere.Thereby, we manage to reduce the spread of the unconstrained ensemble -and thus our uncertainty about past climateby up to 50 % for near-surface temperature in areas close to the assimilated information.For parameters other than near-surface temperature such as total precipitation, assimilation of temperature proxies is less beneficial but we still find positive skill.Furthermore, positive skill is not only constrained to near-surface quantities, but we find value added through data assimilation also for indicators of extratropical and subtropical circulation.
A crucial element of the data assimilation procedure is the background error covariance localisation.This reduces "overcorrection" in areas far away from the assimilated information and gives local information more weight.With the localisation, mean square error skill increases in all regions.The effect of the localisation, however, is most obvious in regions far away from the assimilated information where we find negative skill without the localisation.This negative skill reduces to zero with the localisation.Introduction

Conclusions References
Tables Figures
Past , 6, 627-644, doi:10.5194/cp-6-627-2010, 2010. 2837 Yoshimori, M., Raible, C. C., Stocker, T. F., and   q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Correlation < 0.3 0.3−0.40.4−0.5 0.5−0.6 0.6−0.7 > 0.7 Nov. to April May to Oct. November to April NO localisation q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q a May to October q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q b WITH localisation q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q c q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q d −10 −3 −1 −0.3 −0.1 0 0.1 0.2 0.3 0.5 0.7 RE (see eq. 6)   Fraction of spread after assimilation q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q c q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q d 50 55 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | and concentrations of long-lived greenhouse gases as used in Yoshimori et al. (2010, and references therein).Finally, transient sulphate concentrations are prescribed according to the reconstructed aerosol loads of Discussion Paper | Discussion Paper | Discussion Paper | Koch et al. (1999); before 1850, tropospheric sulphate aerosol concentrations are set to their 1850 values.
• N, and the dynamic indian monsoon index (DIMI), defined as the difference in average zonal winds at 850 hPa in the boxes 5-15 • N, 40-80 • E and 20-30 • N, 70-90 • E. For further discussion of these indices, please refer to Br önnimann et al. (2009).Of the thirty-member initial condition ensemble, we select the thirtieth simulation as the target or reference time series used for validation, and the remaining 29 simulations represent the unconstrained ensemble.The initial condition ensemble is then Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | denotes the i th row and j th column of P b , k indexes the n ens different ensemble members.|d i − d j | is the distance in km between grid box i and grid box j , and L is the cutoff distance.We set L to 5000 km to reduce inter-hemispheric influence.The covariance deflation used here is identical with the Schur product localisation as proposed byHoutekamer and Mitchell (2001, see Gaspari andCohn 1999 for correlation functions).
Discussion Paper | Discussion Paper | Discussion Paper | ) as the closest proxies are given more weight in the data assimilation procedure.In Figures Back Close Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Fig. 4, northern hemispheric and northern European land temperature, northern European precipitation, and various circulation indices are analysed in detail.These aggregated indices have been chosen to illustrate the advantages and limitations of the method as well as for ease of comparison with other climate reconstructions looking at northern hemispheric temperature (e.g.Mann et al., 2005;Moberg et al., 2005), European temperature (e.g.Luterbacher et al., 2004;Franke et al., 2010), or European precipitation(Pauling et al., 2006).We look both at the mean square error skill (Fig.4, a and b) and changes in correlation (panels c and d) between the unconstrained ensemble and the analysis.The mean square error skill is generally more positive for the individual simulations (boxes in Fig. 4, a and b) than for the ensemble average Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Pauling, A., Luterbacher, J., Casty, C., and Wanner, H.: Five hundred years of gridded highresolution precipitation reconstructions over Europe and the connection to large-scale circulation, Clim.Dynam., 26, 387-405, doi:10.1007/s00382-005-0090-8,2006.2845 Pongratz, J., Reick, C., Raddatz, T., and Claussen, M.: A reconstruction of global agricultural areas and land cover for the last millennium, Global Biogeochem.Cy., 22, 2008.Discussion Paper | Discussion Paper | Discussion Paper | change: an overview, Quaternary Sci.Rev., 27, 1791Rev., 27,  -1828Rev., 27,  , 2008.2837 Whitaker, J. S. and Hamill, T. M.: Ensemble data assimilation without perturbed observations, Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 1 .Fig. 1 .
Fig. 1.Correlation of pseudo-proxies with reference time series in boreal winter (November to April, open circles) and summer (May to October, filled dots).

Fig. 2 .
Fig. 2. Mean square error skill score (RE) for near-surface temperature of the analysis ensemble mean compared to the unconstrained ensemble mean without (panels a and b) and with localisation (c and d).Results for boreal winter (November to April, a and c) and for boreal summer (May to October, b and d).Black dots indicate locations at which perfect proxies are assimilated.

Fig. 2 .
Fig. 2. Mean square error skill score (RE) for near-surface temperature of the analysis ensemble mean compared to the unconstrained ensemble mean without (panels a and b) and with localisation (c and d).Results for boreal winter (November to April, a and c) and for boreal summer (May to October, b and d).Black dots indicate locations at which perfect proxies are assimilated.

Fig. 3 .
Fig. 3. Average intra-ensemble standard deviation (spread) for temperature of the ECHAM ensemble in winter (a, November to April) and summer (b, May to October).Percentage of the intra-ensemble standard deviation in the analysis ensemble with respect to the unconstrained ensemble for the EnSRF analysis with pseudo-proxies and localisation in c and d. 24

Fig. 3 .
Fig. 3. Average intra-ensemble standard deviation (spread) for temperature of the ECHAM ensemble in winter (a, November to April) and summer (b, May to October).Percentage of the intra-ensemble standard deviation in the analysis ensemble with respect to the unconstrained ensemble for the EnSRF analysis with pseudo-proxies and localisation in c and d.

Fig. 4 .Fig. 4 .Fig. 5 .Fig. 5 .
Fig. 4. Skill in reconstructing large-scale indicators.The indicators are: northern hemispheric nearsurface temperature over land (NHt2m), northern European temperature (NEUt2m) and precipitation (NEUpr) over land, the strength of the northern subtropical jet (SJ), the northern Hadley Cell (HC), the stratospheric polar vortex (z100), and the dynamic indian monsoon index (DIMI).Mean square error skill score for boreal winter and summer in panels a and b, and correlation for boreal winter and summer in panels c and d respectively.Boxes indicate the interquartile range of skill (correlation) for the individual simulations and the whiskers indicate the range of skill (correlation), the arrowheads indicate the skill (correlation) of the ensemble mean.In panels c and d, the grey boxes and right-facing arrowheads indicate correlation between the unconstrained ensemble and the reference simulation, the white boxes and left-facing arrowheads are the correlation between the simulations after data assimilation and the reference simulation.25 Fig. 4. Skill in reconstructing large-scale indicators.The indicators are: northern hemispheric near-surface temperature over land (NHt2m), northern European temperature (NEUt2m) and precipitation (NEUpr) over land, the strength of the northern subtropical jet (SJ), the northern Hadley Cell (HC), the stratospheric polar vortex (z100), and the dynamic indian monsoon index (DIMI).Mean square error skill score for boreal winter and summer in panels a and b, and correlation for boreal winter and summer in panels c and d respectively.Boxes indicate the interquartile range of skill (correlation) for the individual simulations and the whiskers indicate the range of skill (correlation), the arrowheads indicate the skill (correlation) of the ensemble mean.In panels c and d, the grey boxes and right-facing arrowheads indicate correlation between the unconstrained ensemble and the reference simulation, the white boxes and leftfacing arrowheads are the correlation between the simulations after data assimilation and the reference simulation.