Probabilistic spatial reconstructions of past climate states are valuable to quantitatively study the climate system under different forcing conditions because they combine the information contained in a proxy synthesis into a comprehensible product. Unfortunately, they are subject to a complex uncertainty structure due to complicated proxy–climate relations and sparse data, which makes interpolation between samples difficult. Bayesian hierarchical models feature promising properties to handle these issues, like the possibility to include multiple sources of information and to quantify uncertainties in a statistically rigorous way.

We present a Bayesian framework that combines a network of pollen and macrofossil samples with a spatial prior distribution estimated from a multi-model ensemble of climate simulations. The use of climate simulation output aims at a physically reasonable spatial interpolation of proxy data on a regional scale. To transfer the pollen data into (local) climate information, we invert a forward version of the probabilistic indicator taxa model. The Bayesian inference is performed using Markov chain Monte Carlo methods following a Metropolis-within-Gibbs strategy.

Different ways to incorporate the climate simulations into the Bayesian framework are compared using identical twin and cross-validation experiments. Then, we reconstruct the mean temperature of the warmest and mean temperature of the coldest month during the mid-Holocene in Europe using a published pollen and macrofossil synthesis in combination with the Paleoclimate Modelling Intercomparison Project Phase III mid-Holocene ensemble. The output of our Bayesian model is a spatially distributed probability distribution that facilitates quantitative analyses that account for uncertainties.

Spatial or climate field reconstructions of past near-surface climate states combine information from proxy samples, which are mostly localized, with a model for interpolation between those samples. They are valuable for comparisons of the state of the climate system under different external forcing conditions because they produce a comprehensible product containing the joint information in a proxy synthesis. Thereby, spatial reconstructions are more suitable for many quantitative analyses of past climate than individual proxy records. Unfortunately, spatial reconstructions are subject to a complex uncertainty structure due to uncertainties in the proxy–climate relation and the sparseness of available proxy data, which leads to additional interpolation uncertainties. Therefore, a meaningful reconstruction has to include these uncertainties

We use Bayesian statistics to combine the two modules mentioned above: the (local) proxy–climate relation and spatial interpolation. The Bayesian framework allows for the combination of multiple data types. In our case, these are pollen and macrofossil records to constrain the local climate, and climate simulations, which produce physically consistent spatial fields for a given set of large-scale external forcings. In addition, our framework accounts for several sources of uncertainty in a statistically rigorous way by estimating and inferring a multivariate probability distribution, the so-called posterior distribution

Pollen is the terrestrial proxy with the highest spatial coverage

We apply our framework to a mid-Holocene (MH, around 6 ka) example for two reasons. First, compared with other time slices before the common era, the MH has high proxy data coverage, particularly for Europe. Therefore, we can use pollen and macrofossil data with a sparse but relatively uniform spatial coverage over Europe as input for probabilistic transfer functions, while still having other reconstructions available that can be compared with our results. Second, a multi-model ensemble of climate simulations with boundary conditions adjusted to the MH was produced in the Paleoclimate Modelling Intercomparison Project Phase III

This work is related to several concepts that were developed for applications in paleoclimatology. In recent years, several authors constructed Bayesian hierarchical models (BHMs) for paleoclimate reconstructions:

The structure of the paper is as follows. In Sect. 2, we describe the proxy synthesis and climate simulations that we use. This is followed by a detailed description of our proposed Bayesian framework in Sect. 3. Results from a comparison study of different ways to incorporate the climate simulations in the Bayesian framework and from our reconstruction of the European MH climate are presented in Sect. 4. Finally, we discuss and summarize our methodology and results in Sects. 5 and 6.

The pollen and macrofossil synthesis that we use in this study stems from

The 50 paleosites are sparsely but relatively uniformly distributed over Europe. Their locations are delimited by 6.5

PMIP3 MH ensemble mean anomaly from CRU reference climatology for

Modern climate and vegetation data are used for the calibration of the transfer functions. The climate data are computed from the University of East Anglia Climatic Research Unit (CRU) 1961 to 1990 reference climatology

We use a multi-model ensemble of climate simulations that were run within PMIP3 with forcings adjusted to the MH. This includes changed orbital configurations and greenhouse gas concentrations

Basic information on the PMIP3 climate simulations used to construct the process stage in the Bayesian framework (from

The mean summer climate expressed as MTWA (Fig.

It was shown in

We use Bayesian statistics to combine a network of pollen samples with an ensemble of PMIP3 simulations because in this approach each source of information has an associated uncertainty that is naturally included in the inference process. In this section, we specify the quantities that are combined in our reconstruction and describe the inference algorithm that is used to create the results presented below.

In the following, we denote fossil pollen and macrofossil data by

To further structure the framework, we split the model parameters

Directed acyclic graph corresponding to the Bayesian framework in Eq. (

The Bayesian model uses probabilistic transfer functions to model proxy data, in our case occurrence information on taxa, given a climate state and transfer function parameters. From all the terms in Eq. (

To reconstruct climate from the

We integrate the forward formulation of PITM from

Response functions for

To fit response functions, vegetation data are used because they contain more accurate information on the occurrence of a taxon on the spatial scales of interest compared to modern pollen samples. The disadvantage of using vegetation data for the calibration is that the probability of the presence of a taxon is only valid in vegetation space on the spatial scale taken for the training data but not in the pollen or macrofossil space in which there can be multiple non-climatic reasons for the absence of a taxon like local plant competition or pollen transport effects, as well as local climatic effects below the resolution of our reconstruction.

For the calibration against the modern dataset, we use presence (

As described above, the absence of a taxon in a pollen or macrofossil sample can have reasons that are not included in the absence probability estimated from Eq. (

Finally, we define a prior distribution for

Using a flat prior for

Summary statistics of local reconstructions using the PITM forward model.

An issue of the PITM version used in this study is the inconsistent use of calibration and fossil data by using presence and absence information on taxa for the calibration but only occurring taxa in the reconstruction. Despite this inconsistency, the reconstructions in this study are in agreement with previous versions of PITM, for which only occurrence information was used for calibration. However, there is no simple solution for the problem that the calibration is in vegetation space, whereas the absence of taxa in the fossil samples is information in the pollen or macrofossil space. A promising idea might be to model the absence due to non-climatic reasons as zero inflation by adding a latent variable to estimate the detection probability of a taxon

The ensemble of climate simulations is used to control the spatial structures of the reconstruction and to constrain the range of physically possible climate states for a given external forcing by computing a spatial prior distribution from the ensemble members. This distribution is combined with interpolation parameters

It is not obvious which method for estimating the prior distribution is best suited for the problem at hand and which additional model parameters are appropriate to preserve as much physical consistency contained in the climate simulations as possible but to correct for climate model inadequacies. Therefore, we perform a comparison study of six process stage models that are composed of three techniques to formulate the process stage and two choices for the involved spatial covariance matrix.

The most common approach in the data assimilation literature is to assume that the ensemble members are independent and identically distributed (iid) samples from an unknown Gaussian distribution of possible climate states

The main advantage of this Gaussian model (GM) is that inference is simpler than in more complex probability density estimation techniques. The disadvantage is that it relies on the strong assumption that

A relaxation of the assumptions of the GM is the second model, which we call the regression model (RM) because it is inspired by regression-based models popular in postprocessing and climate change detection and attribution

The third model has been introduced in the data assimilation literature by

Ideally, the covariance matrix of each kernel would correspond to the respective ESM such that the spatial autocorrelation of that ESM is preserved when we sample from its kernel. Unfortunately, there is only one MH run available for each ESM, and the internal variability in those runs is much smaller than the inter-model differences. Using the internal variability of those runs would thus lead to very distinct kernels and allow for too few climate states. Therefore, the covariance of each kernel is estimated from the inter-model differences even though autocorrelation of the individual models is lost. This is a very common choice in kernel-based probability density approximations

Compared to the GM, the empirical covariance matrix

Each kernel gets an assigned weight

A Dirichlet distributed prior is used for

Two advantages of the KM are that it is not assumed that the unknown prior distribution is Gaussian and that the kernels do not rely on an iid assumption for their first-moment properties. However, the KM still relies on an iid assumption for the second moments. The KM preserves the spatial structures of each ESM in the first moments of the kernels. This preservation of physical consistency reduces the degrees of freedom compared to the RM. For example, when the true climate state lies exactly between

The first technique to regularize the empirical covariance matrix (the scaled empirical covariance in the KM), which is applied in this study, is the graphical lasso algorithm

The advantage of the glasso approach is that the empirical matrix can be approximated very closely and the sparseness of the precision matrix facilitates the use of efficient Gaussian Markov random field (GMRF) techniques

To overcome the deficiencies of the glasso approach, we propose an alternative covariance regularization technique. The so-called shrinkage approach

Let

Ideally, the parameters

Because PITM is non-Gaussian and nonlinear, the posterior climate does not belong to a standard probability distribution. Therefore, Markov chain Monte Carlo (MCMC) techniques are used to asymptotically sample from the correct posterior distribution. These samples allow for analyses beyond summary statistics like means and standard deviations. A Metropolis-within-Gibbs strategy is implemented, which means that in each update of the Markov chain, we sample sequentially from the full conditional distributions (i.e., the distribution of the respective variable given all other variables) of

To sample the regression parameters

Sampling from

To sample from the full conditional of

The multi-modality of the KM makes inference for this model a lot more challenging than for the GM and RM. The problem of efficient MCMC algorithms for multi-modal posterior distributions is a widely acknowledged issue in the literature

To speed up the inference, grid boxes with proxy data and those without proxy data are treated sequentially. First,

Detailed formulas for the full conditional distributions are given in Appendix

In this section, results from a comparison study of the six different process stage models are shown. Then, the MH reconstruction for Europe with the

In this section, the reconstruction skill of the three process stage formulations (GM, RM, KM) and the two covariance models (glasso, shrinkage) are compared using two types of experiments. Identical twin experiments (ITEs) use the climate simulation ensemble by simulating pseudo-proxy data from one ESM and trying to reconstruct that reference climatology from the simulated proxies and the remaining ensemble members. These experiments facilitate the understanding of different modeling approaches for the process stage in a controlled environment. In particular, the evaluations do not have to rely on indirect observations, as is the case in real paleoclimate applications for which the true climate state is unknown. The second type of experiments are CVEs for which spatial reconstructions with the

The first step in an ITE is to choose a reference ESM with climate state

Averaged over all ITEs with the same process stage model and averaged in space, the mean deviation between the reference climate and posterior mean as a measure for systematic biases is close to 0

Summary measures for ITEs and CVEs. Summary measures for ITEs and CVEs with the six process stage models.

Results from ITEs. The box plots depict the distribution of experiments with the same process stage model.

The higher number of spatial modes in the shrinkage covariances leads to larger posterior uncertainties than for the glasso models because the limited information contained in the proxy data can constrain only a small number of spatial modes (Table

To analyze the combined effect of biases and dispersiveness, the continuous ranked probability score (CRPS) is computed. This is a common strictly proper score function for evaluating probabilistic predictions

With a spatially averaged mean around 1

Mean CRPS in ITEs for GM, RM, and KM. Top row: models with glasso covariance matrix, MTWA. Second row: models with shrinkage covariance matrix, MTWA. Third row: models with glasso covariance matrix, MTCO. Bottom row: models with shrinkage covariance matrix, MTCO. Grid boxes with simulated proxy data are depicted by black dots.

CVEs are a way to understand the ability of a spatial reconstruction method to produce consistent estimates. In paleoclimatology, the issue is that all observations are indirect, which means that poor evaluations can result from errors in the process stage or the data stage. The assumption behind CVEs is that the data stage is unbiased or at least consistently biased among different proxy samples. Cross-validations are evaluated in the observation space. In this study, this is the vegetation space, i.e., the occurrence of taxa in a grid box. As the only reliable information available from the pollen and macrofossil synthesis on the vegetation composition in a grid box is the presence of certain taxa, this is also the only data used for the evaluation. Due to the sparseness of the proxy network, leave-one-out CVEs are performed and no more data are left out in each experiment.

In each CVE, a reconstruction with the Bayesian framework is computed with all proxy samples except for those in one grid box

A problematic step in the methodology described above is that the BS is only evaluated for occurring taxa for the reasons discussed in Sect.

The models with glasso covariances perform slightly worse than those with shrinkage covariances, as the mean BS takes values of 0.186 (GM, RM) or 0.187 (KM) for the glasso-based models compared to values between 0.161 and 0.165 for models with shrinkage covariances (Table

The ITEs show that the models with shrinkage matrix covariances are more dispersive, less biased, and more robust than those with glasso covariance matrices. These properties transfer to the CVEs in which the models with a shrinkage covariance matrix perform better, too. The results from models with the same covariance matrix are very similar except that the KM with a shrinkage covariance matrix is on average more biased than the respective GM and RM. This shows that the covariance matrix choice determines the reconstruction skill more than the general formulation of the process stage as Gaussian, regression, or kernel model. The reason for this strong effect of the regularization technique might be the small ensemble size and the fact that the modes of the inter-model variability do not explain the spatial variability of the climate optimally, which further reduces the useful spatial modes in the empirical covariance matrix.

The better performance of shrinkage covariance models shows that the low number of spatial modes is the main reason for the under-dispersiveness of the glasso-based models. On the other hand, the over-dispersiveness of the shrinkage models should be an indicator that this model is not under-dispersed even in real-world applications that face additional challenges from potentially biased or under-dispersed transfer functions and a more sophisticated spatial structure of the climate state than in the ESM climatologies. Additionally, this over-dispersiveness shows that in most regions the ensemble spread is wide enough to lead to reconstructions that do not feature posterior distributions that are too narrow.

The larger biases of the KM with the shrinkage covariance matrix compared to the GM and RM are a result of ensemble member weight degeneracy in the particle filter part of this model. The ensemble member weights tend to degenerate towards the least deviating model such that the mean values are biased towards that model. This tendency increases with the strength of the proxy data signal. This is a well-known issue of Bayesian model selection

Based on the results presented in the previous section, the models with a shrinkage matrix should be preferred over those with glasso covariance models. In addition, the smaller biases and more robust nature of the GM and RM with the shrinkage covariance matrix compared to the KM model makes them superior choices. Because the RM adjusts more flexibly to the proxy data than the GM, this model is presumably better suited to deal with additional caveats of real-world applications. Therefore, this model is used for the spatial reconstructions, whose results are presented in this section. Reconstruction results are summarized in Table

The spatially averaged mean temperature of the reconstruction (posterior mean) is 18.27

Spatial reconstruction for MH.

Most of the taxa used in the reconstruction are more strongly confined for MTWA than for MTCO because the growth of most European plants is more sensitive to conditions during the growing season. This results in more constrained local MTWA reconstructions (Fig.

The highest reduction of uncertainty due to the inclusion of proxy data is found in grid boxes with proxy data, as quantified by a spatially averaged reduction of point-wise CI sizes from prior to posterior of 50.1 % compared to 26.0 % for grid boxes without proxy data (Fig.

To study whether the degree of spatial smoothing of the reconstruction is reasonable, a measure inspired by discrete gradients is calculated. For each grid box, the mean absolute difference between the value in the box and its eight nearest neighbors is computed. Then, the spatial averages of this homogeneity measure

By comparing the posterior with the prior and the local reconstructions, it can be seen that for most areas with nearby proxy records the posterior mean resembles the local reconstructions more than the PMIP3 ensemble mean. This shows that the uncertainty in the prior distribution is large enough to lead to a reconstruction that is mostly determined by proxy data, where available. The posterior MTWA mean is warmer in northern Europe than the prior mean and cooler in southern and eastern Europe. For MTCO, the posterior mean is much warmer than the prior mean in Fennoscandia and slightly cooler in southern Europe.

The posterior weights

Posterior ensemble member weights (

CVEs provide inside into the value that is added to the unconstrained PMIP3 ensemble, represented by process stage Eq. (

For most left-out proxy samples, the BSS is positive (68.9 % of grid boxes) with a median of 0.28 (Fig.

BSS from leave-one-out cross-validation:

The persistent negative BSS values for the British Islands are evidence of a systematic issue. For this region, the uncertainty in the local reconstructions is larger than for other areas such that the local proxy records constrain the posterior less than the posterior ensemble member weights and some of the more distant proxy records. This leads to a reduction of the posterior uncertainty compared to the unconstrained PMIP3 ensemble but without improving the concordance of the mean state with the local reconstructions, which in turn results in negative BSS values. In and near the Alps, negative BSS might be a result of insufficient accounting for orographic effects in the different sources of information.

To study the effect of reconstructing MTWA and MTCO jointly compared to separately, additional reconstructions with only one climate variable are computed. Note that the interactions of MTWA and MTCO are twofold in the joint reconstruction: (a) the response functions have an interaction term, and (b) the process stage contains joint ensemble member weights for MTWA and MTCO as well as inter-variable correlations in the empirical correlation matrix.

The separate MTWA reconstruction is on average around 0.5

Summary measures for the joint MTWA and MTCO reconstructions (rows 1 and 2) and the separated reconstructions of MTWA (row 3) and MTCO (row 4). Numbers in brackets are minima and maxima of the corresponding 90 % CIs.

Differences of joint and separate reconstructions of MTWA and MTCO.

The BSS pattern in the MTWA-only reconstruction is mostly the same as in the joint reconstruction except for slightly positive skill in the British Islands (Table

The results show that the more constrained local MTWA reconstructions have a higher influence on the joint reconstruction than the local MTCO reconstructions. Reconstructing MTWA and MTCO jointly should in theory lead to a physically more reasonable reconstruction by creating samples drawn from the same combination of ensemble members. On the other hand,

Our approach is designed with the goal of being more suitable for sparse data situations than standard geostatistical models. To understand the robustness of the Bayesian framework with respect to the amount of data included in a proxy synthesis, five experiments with only half of the samples are performed, which are either selected to retain the spatial distribution of proxy samples or chosen randomly. In all of the tests, the general spatial structure of the posterior distribution, including the anomaly patterns, is preserved despite the fact that local anomalies and the magnitude of changes vary depending on the chosen proxy samples, which should be expected when such a large portion of the already sparse data is left out. Only the Norwegian Sea in the MTCO reconstruction changes substantially in some experiments. Plots from the experiments with reduced proxy samples are provided in the Supplement.

The mean spatial averages differ by up to 0.6

The large PMIP3 ensemble spread for most grid boxes shows that the prior distribution, which is calculated from the ensemble, contains a wide range of possible states. In areas that are well constrained by proxy data, this large total uncertainty leads to a reconstruction that depends little on the climatologies of the ensemble members. Hence, in these areas, the reconstruction is not sensitive to the particular formulation of the process stage (compare with the Supplement). This shows that our method is applicable despite well-known model–data mismatches for the MH

Several reconstructions of European climate during the MH have been previously compiled. Here, we compare our reconstructions to those of

The same pollen dataset and another version of PITM are used in

A reconstruction designed to evaluate the PMIP3 simulations was provided by

The comparisons show that patterns like the dipole-type anomaly structure, which are not present in the PMIP3 ensemble, seem to be consistent across reconstructions with pollen transfer functions. While some of the differences between the existing literature and our results can be explained by the used transfer functions and proxy syntheses, the choice of an appropriate interpolation method plays an important role, too, especially in areas with very sparse and weakly informative proxy data.

To account for inadequacies of climate models in simulating past climate states, we introduced flexible ensemble member weights

The strong effect of the covariance regularization technique on the reconstructions might originate from the small ensemble size. This hypothesis can be tested when more simulations with sufficient resolution become available, for example from the PMIP4 project. In addition, it indicates that the modes of the empirical covariance matrix do not optimally explain the spatial variability of the climate and the corresponding uncertainty structures. The difference between under-dispersive behavior in ITEs with glasso models and over-dispersion for shrinkage models suggests that the optimal number of effective degrees of freedom lies between those two models. However, an optimization procedure for the number of spatial modes in the covariance matrix is not straightforward and left for future research.

In the current study, we use a fixed prior distribution for the ensemble member weights (compare with Sect.

We presented a new method for probabilistic spatial reconstructions of paleoclimate. The approach combines the strengths of pollen and macrofossil records, which provide information about the local climate state, and climate simulations, which downscale forcing conditions to physically consistent regional climate patterns. Thus, we reconstruct physically reasonable spatial fields, which are consistent with a given proxy synthesis. Our framework can deal with probabilistic transfer functions, which are nonlinear and non-Gaussian, such that an extension to a wide range of proxies and associated transfer functions is possible.

Using ITEs and CVEs, we showed that robust spatial reconstructions with Bayesian filtering methods that exhibit small biases and are not under-dispersed are possible as long as the statistical framework is flexible enough to account for deficiencies of climate simulations and to avoid filter degeneracy, which can emerge due to small ensemble sizes and biases in climate simulations. The resulting model, which is used for spatial reconstructions of European MH climate, uses a weighted average of the involved ensemble member climatologies and a shrinkage matrix approach for spatial interpolation and structural extrapolation of the proxy data.

We apply our framework to reconstruct MTWA and MTCO in Europe during the MH using the proxy synthesis of

R code for computing reconstructions with the presented Bayesian framework is provided in a Bitbucket repository available under

To determine the glasso penalty parameter

The shrinkage target

The Metropolis-within-Gibbs approach samples (asymptotically) from the full conditional distributions of each variable, i.e., the distribution of the variable given all other variables. Some variables are treated block-wise. In this Appendix, we detail the conditional distributions that are used for sampling.

To sample the transfer function parameters, we introduce augmented variables

Sampling from

In the KM, the full conditional of

If shrinkage covariance matrices are used, the parameters (

We update

Conditioned on

As described in Sect.

We run

The supplement related to this article is available online at:

NW performed the theoretical work, the model and graphics implementation, and wrote the major part of the text. AH and CO proposed the reconstruction framework, contributed to the discussion of results, and commented on the different versions of the paper.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Paleoclimate data synthesis and analysis of associated uncertainty (BG/CP/ESSD inter-journal SI)”. It is not associated with a conference.

Nils Weitzel was additionally supported by the German Research Foundation (code RE3994-2/1) and the National Center for Atmospheric Research (NCAR). NCAR is funded by the National Science Foundation. We acknowledge all groups involved in producing and making available the PMIP3 multi-model ensemble. We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP5. For CMIP5 the US Department of Energy’s Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led the development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. We thank Douglas Nychka for helpful ideas to speed up the MCMC algorithm. We thank two anonymous referees and the editor for their interesting and helpful comments, which facilitated a substantial improvement of this paper.

This research has been supported by the German Federal Ministry of Education and Research (BMBF) as a Research for Sustainability initiative (FONA;

This paper was edited by André Paul and reviewed by two anonymous referees.