We present a new reconstruction of surface air temperature and sea surface temperature for the Last Glacial Maximum. The method blends model fields and sparse proxy-based point estimates through a data assimilation approach. Our reconstruction updates that of

There is a significant demand for reconstructions of the large spatial patterns of paleoclimatic states, in order to understand past climate changes and how these may relate to expected future changes

Here we present a new reconstruction of the Last Glacial Maximum (LGM).
We use the diverse ensemble of model simulations of past climate states which were generated by the GCMs which participated in various community model intercomparison projects, together with comprehensive data sets over land and ocean, in order to generate spatially complete and physically coherent maps of surface air temperature (SAT) and sea surface temperature (SST) for both periods. Our approach has some similarities to the methods of

The Last Glacial Maximum (LGM, 19–23 kyr BP) is the most recent period in which the global climate was broadly in a quasi-equilibrium state very different to the modern climate, and as such has been widely investigated and used both to test the ability of models to simulate the response to substantial radiative forcing and also to estimate the equilibrium climate sensitivity

In AH13, we used a heuristic multi-model pattern scaling approach using multiple linear regression to generate fields for surface air temperature and sea surface temperature at the LGM. While this method successfully reproduced the large-scale features of the LGM, it was not able to fit more localised features that may exist in the data but not in the model simulations. This previous reconstruction also contained small-scale features in areas which were sparse in data, which we suspected were possibly the result of noise that was artificially generated by the reconstruction method.

The method we use here for our new analysis is a Bayesian approach based on ensemble Kalman filtering, similar to that of

Here

The specification of the prior distribution

Our prior is based on an “ensemble of opportunity” consisting of the set of 31 LGM simulations generated by a range of structurally distinct climate models which contributed to several Paleoclimate Model Intercomparison Projects: PMIP2

Models available for the LGM reconstruction. G indicates removal for gridding problems. D indicates removal for duplication/similarity and O indicates removal as an outlier. PMIP2 data were sourced from

The model simulations that are available to us are listed in Table

We also recognise that this meta-ensemble contains several near-duplicate models which share a common heritage

However, we have limited direct knowledge about the design and structure of the wide range of models available to us, and therefore we chose to augment and validate this prior judgement with an a posteriori measure of pairwise model similarity in terms of their LGM anomaly fields, as measured by both pointwise rms difference and also pattern correlation. For the most part, these measures merely confirmed what we already believed to be the case. However, in one case (specifically, the AWIESM models being similar to the ECHAM and MPI models), this check alerted us to a model relationship that we had not been previously aware of but could identify in retrospect from literature, and in another case (the CCSM and the CESM variants), it suggested that changes between model versions had been more substantial than anticipated.

Based on these analyses, we therefore concluded that the following groups of models were unusually similar to each other and required thinning: (ECBILTCLIO, iLOVECLIM1-1-1-GLAC-1D, iLOVECLIM1-1-1-ICE-6G-C), (HadCM3M3, HadCM3M3(V), HadCM3-GLAC1D, HadCM3-ICE6GC, HadCM3-PMIP3), (IPSL-CM4-V1-MR, IPSL-CM5A-LR, IPSL-CM5-A2) (MIROC3.2.2, MIROC-ES2L) and (ECHAM53-MPIOM127, MPI-ESM-P, AWIESM1, AWIESM2, MPI-ESM1-2). As the large number of models based on versions of CCSM/CESM all appear to differ substantially from each other, we retain all of them at this point.

From the group of LOVECLIM models, we remove ECBILTCLIO and iLOVECLIM1-1-1-GLAC-1D, keeping iLOVECLIM1-1-1-ICE-6G-C since it is most representative of the mean of this group of models. From the IPSL group we drop the intermediate IPSL-CM5A-LR version, keeping both the older IPSL-CM4-V1-MR and the highest-resolution IPSL-CM5-A2 model, which are more substantially different from each other. We keep the more recent MIROC-ES2L and remove the older MIROC3.2.2 on the basis that newer models are likely to outperform older ones. We remove HadCM3m3(V) and HadCM3-ICE6GC, retaining the three other variants in this group, which again differ quite significantly. Both of AWIESM models are very similar to each other and also to the MPI-ESM1-2 from which they are derived, so we retain the latter only.

This thinning of the ensemble increases the minimum pairwise area-weighted rms difference between the modelled LGM anomaly fields from 1 to 2.3

Our assimilation technique (presented in Sect.

Figure

Histogram of global mean surface air temperature of PMIP models. Blue shows retained models. Red shows omitted models.

The original 28-member meta-ensemble, before we perform the thinning process, has a mean globally averaged temperature anomaly of

Even after this thinning process, we are not yet confident that the ensemble can be considered a credible prior, as the inclusion of models in PMIP experiments themselves is rather arbitrary due to contingencies such as the motivation and interests of research staff and the availability of sufficient resources, and the total number of simulations is small. While similar “ensembles of opportunity” have frequently been used as a representation of uncertainty, there is increasing recognition that this is a somewhat risky choice to make

While we do of course use observations to further constrain the paleoclimate reconstructions, the data are sufficiently sparse and uncertain that we anticipate the potential for significant sensitivity to the prior, which we show to be the case in Sect.

The pattern scaling algorithm to calculate the new ensemble mean follows the approach described in AH13, but instead of using the full set of model anomaly fields as predictors, we only use the first four empirical orthogonal functions (EOFs) of this ensemble in order both to reduce noise in the fitted field and also to reduce the number of predictor variables. That is, we identify four scaling factors

The first four EOFs represent large-scale patterns such as the overall cooling pattern and latitudinal and land–sea contrasts, though they have no direct physical interpretation. Due to the uncentred approach, the first EOF mode is close to the ensemble mean and represents 92 % of the total variance, with the next three EOFs only representing between 2.6 % and 0.67 % of the total variance. However, collectively this amounts to 60 % of the remaining variance after the first EOF is removed.

The result of this pattern scaling is then used as the mean of the translated ensemble. This allows us to fit the largest-scale patterns but only uses 4

The ensemble translation is performed by applying an identical linear translation operation to each ensemble member.
That is, we replace each model field

The initial ensemble mean (after the thinning process described earlier) has a mean bias relative to the data points of 0.5

After translation, the global temperature anomaly of the ensemble mean is reduced to

Figure

The prior: translated ensemble mean and data. Panels

Our LGM reconstruction relies primarily on three syntheses of data: the sea surface temperature (SST) analysis of the MARGO project

While our method uses temperature anomalies, the absolute temperatures from TEA20 and MARGO are in fact in marginally closer agreement to each other than the anomaly data are, at the grid points where both data sets exist.
We therefore take the absolute data from both sources to use in our analysis, and we calculate a new set of anomalies on a regular 5

When limited to locations for which all climate models have SST outputs after regridding to the same 5

As a check of the calibration of the ensemble, we present in Fig.

We now consider the uncertainties of the data.
The MARGO data set has an assessment of quality which is widely interpreted as an uncertainty estimate, with the rms value across the data points being 2.0

As outlined in Sect.

The method we use for the analysis is an ensemble Kalman filtering algorithm

In the ensemble Kalman filter, we update each member of the prior ensemble

We use a localised ensemble Kalman filter algorithm, using the localisation function of

We update SAT and SST simultaneously, using the same length scale for both data sets, so as to ensure physically consistent SAT and SST fields. Thus, the SAT data help to constrain SST results and vice versa, maximising the spatial coverage of the sparse data.
Since we have already re-positioned the ensemble mean closer to the data by means of the EOF pattern scaling, our approach does formally amount to an overuse of the data, but without this step we cannot be confident that we have an adequately trustworthy and unbiased model-based prior. Leave-one-out validation presented in Sect.

We present our reconstruction in Fig.

Summary of temperature reconstruction. Area-weighted average of reconstruction of SAT and SST, global and tropical (30

The residual rms difference between the prior mean and the data is

As our primary validation of the analysis, we performed a full set of leave-one-out cross-validation analyses, removing each data point in turn and using it as an independent test of the method when the remaining 404 points are used in the reconstruction. Leaving out a data point in this way enables us to check that the analysis is providing a genuinely improved reconstruction and not merely an elaborate but untrustworthy curve-fitting procedure to the data points that were used.

In each analysis, we start by performing the debiasing step based on the initial reduced ensemble of 19 simulations. The rms value of the differences between each data point and the corresponding ensemble mean prediction at that grid point before we perform the recentring is 2.8

Posterior uncertainties in

In order to check the robustness of our analysis, which makes use of a number of subjective assumptions, we have performed a large number of sensitivity tests, both regarding the details of our method and regarding the input data.

Firstly, we consider our treatment of the model prior. Our first step, as described in Sect.

Returning to our preferred choice of 19 models, if we do not include the step where we translate the ensemble mean via our pattern-scaling approach, then the result becomes

We illustrate this issue by creating two alternative model ensembles, scaling the 19 model anomaly fields by a factor of 2 (0.5) in order to consider what would have happened in the case where the prior ensemble substantially overestimates (underestimates) the true climate change at the LGM.
In the case where the anomaly fields are doubled (halved), the 19-member prior ensemble has a mean temperature anomaly of

The poor fitness of these artificial priors can also be detected by the rank histograms of the data for each ensemble. The rank histograms shown in Fig.

Rank histograms of data for four ensembles.

The use of four EOFs in the debiasing is another place where a different choice could have been made. The main reason for using a limited number of EOFs, rather than all model fields as was done by AH13, is to reduce the presence of noise and likelihood of overfitting in the pattern. However the use of too few EOFs would limit the ability to reduce regional biases such as the degree of polar amplification and land–sea contrast, which are strongly represented in the first few EOF patterns. While our choice of four EOFs remains a largely subjective one, using a different number of EOFs only changes the final global mean temperature estimate by up to 0.2

The data assimilation algorithm itself is conventional, but the length scale to use for the localisation is uncertain. While there has been research into the appropriate choice of length scale for application to numerical weather prediction

Partly as sensitivity tests for our results, and also in order to compare with previous work, we have tested the effect of using different subsets of the data. AH13 used the

If we use the TEA data alone for the analysis, with no land data at all, the resulting anomaly is slightly warmer than our main result, at

Using smaller observational errors also does not change the mean of the posterior, and if we try doing this then residuals we obtain are substantially larger than can be explained by observational error. For example if we use an error estimate of

As a result of our tests, we consider our broadened posterior uncertainty range, compared to our earlier estimate, to be well founded and insensitive to reasonable choices. However, all of these results contrast strongly with the reconstruction of

Our new result of

A larger difference between the results is that the uncertainty is substantially higher in the new reconstruction, and this is due to different methodological assumptions. In the present work, the uncertainty is primarily derived from the spread of the ensemble, albeit this prior uncertainty is reduced in the neighbourhood of data points according to the Kalman equations. In the AH13 reconstruction, the uncertainty was a heuristic estimate based on the results of the pattern scaling results when tested with synthetic (model-derived) data sets, and the data points all had global influence, in contrast to the localisation approach used here. Thus, in this new reconstruction, substantial regions of the Pacific Ocean are only weakly constrained as few data points are available in the neighbourhood of these grid points. Our new approach is more in keeping with general practice for Bayesian estimation but does place a heavy burden on the model “ensemble of opportunity” as providing a reasonable representation of our uncertainty, especially when data are as sparse and uncertain as they are here. The area-weighted rms difference between the SAT fields of our new result and that of AH13 over the spatial grid is almost 2.4

Latitudinal temperature anomalies. Thin black lines show the ensemble members excluding the CESM1-2-derived member. The green line shows the CESM1-2-derived member. The thick black line shows the posterior ensemble mean. The purple line shows the TEA20 posterior mean result.

The greater difference between our reconstruction and that of TEA20 requires more detailed investigation. Using a similar Kalman filtering approach, TEA20 obtained a posterior estimate for global mean surface air temperature of

As an additional test, we performed the state estimation as described in Sect.

Thus, our analysis suggests (in agreement with TEA20) that such a cold LGM state is plausible, but it also suggests that considerably milder states are also compatible with the data. In fact while most (67 %) of the TEA20 SAT reconstruction lies within the 2 standard deviation range of our estimate, less than 40 % of our reconstruction lies within the 4 standard deviation range of the TEA20 result. That is to say, we consider the mean result of TEA20 to be reasonably compatible with the data and our own result, whereas conversely the TEA20 analysis strongly rejects most of our posterior range, and indeed also rejects the previous analysis of AH13.

While the prior is a personal choice that researchers may reasonably differ over, it seems doubtful to us that the prior of TEA20 is suitable for this problem. The single model prior of TEA20 does not consider any of the uncertainties in model structure, forcing efficacies or feedback strengths that contribute to our uncertainty regarding the LGM state. For example, all of the model instances in TEA20 will have the same equilibrium climate sensitivity. The reconstruction of TEA could not allow for the possibility of a significantly milder climate than was obtained, because this was excluded from their prior. This can be seen directly from Fig. 2d in TEA where the prior 95 % range is restricted to values for global mean temperature of colder than around

A multi-model ensemble, such as the one we are using, has been shown to represent uncertainty more realistically than any single model is able to do

While our approach makes use of the diverse set of simulations generated by the multi-model ensemble that contributed to the PMIP experiments, we are limited to using the outputs that were generated by these experiments and cannot obtain diagnostics that are not already available or derivable from the variables that were saved. This limitation means we can only perform the analysis in temperature space, rather than in proxy space, which TEA20 were able to do using their proxy-enabled ocean model. Where possible, working in proxy space should be superior if there are sufficient proxy-enabled models and the modelling of the proxies is sufficiently skilful, as it can in principle account for ocean transport and mixing more realistically than a statistical calibration may do. However our approach has two important competing advantages to offset these limitations. The first of these is that we can use a wide range of proxy types without the need for development of climate models that include them as prognostic variables, so long as a calibration of these proxy data to local temperature is available. The second is that we can use a wide range of climate models, without needing to implement proxy models and integrate them ourselves. Both of these aspects allow us to consider a greater range of uncertainties and hopefully produce a more robust result.

We have presented a new reconstruction of the Last Glacial Maximum, with a global mean surface air temperature anomaly of

While our approach is based on the well-established ensemble Kalman filter approach which has been widely used for a range of data assimilation tasks, we have shown that in this application, due to the limited availability of data, a biased prior may strongly affect the result, and we have shown that bias-correcting the prior can be important for generating an accurate result.

The larger posterior range, and differences between this result and the previous analysis of AH13, does point to the importance of several major sources of uncertainty which limit the precision that can be achieved. Models generate very different climates, especially at high latitudes where the presence or absence of sea ice can result in widely varying air temperatures. Improvement of model simulations would, of course, help the creation of accurate climate reconstructions, but it is important that the range of models included in the PMIP ensembles represent all the main sources of uncertainty as realistically and comprehensively as practicable if they are to be used for this purpose. Substantial areas of the globe, such as much of the Pacific Ocean, are poorly served by proxy data, and our comparison of different data compilations suggests that proxy-based temperature estimates have substantial uncertainties. Better understanding and calibration of proxies would allow for a more precise result, and the widespread inclusion of forward modelling of proxies could also potentially help to reconcile these two sources of information.

Code and data underpinning the analysis presented here are included as Supplement.

The supplement related to this article is available online at:

JDA and JCH designed and performed the analysis. All authors contributed to the writing.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank Jean Yves Peterschmitt and Masa Kageyama for creating and making available the monthly climatologies of TAS and TOS for the PMIP4 LGM models. This project was funded by the European Research Council (ERC) (grant agreement no. 770765) and the European Union's Horizon 2020 research and innovation program (grant agreement nos. 820829 and 101003470).

This research has been supported by the European Research Council, H2020 European Research Council (CONSTRAIN (grant no. 820829)).

This paper was edited by André Paul and reviewed by Jessica Tierney and one anonymous referee.