Using palaeo-climate comparisons to constrain future projections in CMIP 5

We present a selection of methodologies for using the palaeo-climate model component of the Coupled Model Intercomparison Project (Phase 5) (CMIP5) to attempt to constrain future climate projections using the same models. The constraints arise from measures of skill in hindcasting palaeo-climate changes from the present over three periods: the Last Glacial Maximum (LGM) (21 000 yr before present, ka), the mid-Holocene (MH) (6 ka) and the Last Millennium (LM) (850–1850 CE). The skill measures may be used to validate robust patterns of climate change across scenarios or to distinguish between models that have differing outcomes in future scenarios. We find that the multi-model ensemble of palaeo-simulations is adequate for addressing at least some of these issues. For example, selected benchmarks for the LGM and MH are correlated to the rank of future projections of precipitation/temperature or sea ice extent to indicate that models that produce the best agreement with palaeo-climate information give demonstrably different future results than the rest of the models. We also explore cases where comparisons are strongly dependent on uncertain forcing time eries or show important non-stationarity, making direct inferences for the future problematic. Overall, we demonstrate that there is a strong potential for the palaeo-climate simulations to help inform the future projections and urge all the modelling groups to complete this subset of the CMIP5 runs. Published by Copernicus Publications on behalf of the European Geosciences Union. 222 G. A. Schmidt et al.: Using palaeo-climate comparisons to constrain future projections in CMIP5


Introduction
The Coupled Model Intercomparison Project (Phase 5) (CMIP5) is an ongoing coordinated project instigated by the Working Group on Coupled Modelling (WGCM) at the World Climate Research Programme (WCRP) and consisting of contributions from over 25 climate modelling groups (and over 30 climate models) from around the world (Taylor et al., 2012).Multiple experiments are being coordinated, including historical simulations , future simulations following multiple representative concentration pathways (RCPs) and crucially, for the first time in CMIP, three sets of palaeo-climate simulations for the Last Glacial Maximum (LGM) (21 ka BP -Before Present), the mid-Holocene (MH) (6 ka BP) and the Last Millennium (850-1850 CE).The palaeo-climate simulations are also part of the Paleoclimate Model Intercomparison Project (Phase 3) (PMIP3) initiative.
The CMIP5/PMIP3 palaeo-simulations are true "out-ofsample" tests in that none of the models have been "tuned" to produce better palaeo-climates.Such tuning is not necessarily unwise (see Schneider von Deimling et al., 2006 for an example), but would complicate some of the potential analyses.Because the same models are being used for both past and future simulations, this archive of model output is a unique resource for research into the connections between model skill and model predictions, and has the potential to greatly improve assessments of future climate change.
There were many uncertainties in regional aspects of future climate projections highlighted in the Intergovernmental Panel on Climate Change (IPCC) 4th Assessment Report (AR4) (Meehl et al., 2007).These affected, for example, the future of sub-tropical rainfall, El Niño-Southern Oscillation (ENSO) changes, potential declines in the North Atlantic meridional circulation, and the fate of Arctic sea ice.Reducing the uncertainties in the projections could therefore have significant real world consequences for both adaptation and mitigation strategies.
There are three main classes of prediction uncertainty which relate to (a) the choice of scenario, (b) internal variability (sometimes described as initial condition uncertainty), and (c) the imperfections in the model (or structural uncertainty) (Hawkins and Sutton, 2009).Scenario uncertainties inevitably grow in importance with time, particularly after about 30 yr due to the timescales associated with economic change, CO 2 residence time and ocean thermal inertia.Initial condition uncertainty is globally important on scales of a few years (and longer at smaller spatial scales) but predictability is fundamentally limited by the chaotic dynamics of the atmosphere and upper ocean.Thus at the multi-decadal time horizon, reducing and/or better characterising structural uncertainty is the only way to potentially reduce overall uncertainty.These structural uncertainties (given a specific scenario of future emissions and other drivers) arise from a combination of model divergence -i.e. a large spread in model predictions given the same future scenario, and model inadequacy -i.e.models that are collectively either incomplete, inaccurate or are missing processes or feedbacks.The first effect is explicit (though not completely explored) in a multimodel ensemble, while the second is implicit and needs to be assessed independently.
Observations provide the means to test the models and reduce these uncertainties but instrumental records of useful data targets are few (essentially limited to in situ networks of temperature and rainfall prior to the satellite era).Additionally, and perhaps more crucially, changes in the recent past are relatively small compared to projections for the future.Furthermore, the majority of skill metrics in historical (20th century) simulations do not provide much guidance for future projections: models that are either good or bad at simulating some aspect of modern climate -the climatology, seasonal cycle, or interannual variability -often give essentially the same spread for the future (Santer et al., 2009;Knutti et al., 2010b).The reasons for this can range from the tuning procedure in the models, disconnects between the important physics at different timescales or in response to different drivers, or the very different magnitudes of change.Palaeo-climate changes offer a substantially larger signal that is commensurate with projected future changes and although palaeo-climate records are often affected by substantial noise and difficulties in interpretation (Schmidt, 2010), the most robust reconstructions can provide a crucial test of model performance over a wider range than is possible with the 20th century climate alone.
There have been many previous evaluations of palaeoclimate simulations via earlier incarnations of PMIP, as well as in many individual studies (see the review by Braconnot et al., 2012).However, there has been a lack of analyses that quantitatively link future simulations or forecasts with skill or sensitivity in the palaeo-climate simulations (though see Hargreaves et al., 2013 for an example).This is partly because (prior to CMIP5) palaeo-simulations were not done with exactly the same versions of the models being used for future projections and partly due to a lack of suitable reconstructions for model evaluation.This paper is therefore specifically focused on making the connections between palaeo-climate changes and the future rather than on understanding palaeo-climate change for its own sake.
We break this task into three main areas: (1) examples of metrics that are robust across palaeo-and future simulations, where skill in palaeo-climate evaluations builds credibility for the projections going forward; (2) examples of metrics that discriminate between different models in the past and in the future, and thus may be used to weight model projections; and (3) examples where important caveats come into play that prevent constraints from being useful.We specifically include examples in the third section where important caveats currently limit the palaeo-climate constraints to provide guidance to others on pitfalls that can occur.The scope of the paper is as follows: Sect. 2 discusses some background on dealing with the multi-model ensemble, issues arising from the use of palaeo-climate proxy data and the use of data-synthesis products; Sect. 3 discusses specific examples of skill metrics that may have predictive power in future simulations by showing robust behaviour across palaeo and future experiments; Sect. 4 gives examples that discriminate between future projections; Sect. 5 presents some exploratory analysis of additional potentially useful metrics that are problematic for various reasons; Sect.6 concludes and discusses the potential for further work in this area.

Palaeo-climate reconstructions
Many of the problems in dealing with reconstructing climate from palaeo-data are specific to the type of record, the time period and resolution concerned -for instance, annually resolved tree rings have issues distinct from lower resolution ocean sediment or pollen records (e.g.Kohfeld and Harrison, 2000;Ramstein et al., 2007;Jones et al., 2009;Harrison and Bartlein, 2012).There are however a number of general issues that affect the use of such data for model evaluation, including the potential for multiple climate controls on a given record, the scale over which they are representative, the need to quantify (and take into account) reconstruction uncertainties, and the sparse and uneven site coverage.
Records used for palaeo-climate reconstructions are in general influenced by several different aspects of climate as well as, potentially, non-climatic factors.For instance, oxygen or hydrogen isotopes from ice cores, carbonates or organic matter, are climatically meaningful variables, but do not necessarily have a one-to-one, stationary relationship with temperature or precipitation (e.g.Werner et al., 2000;Schmidt et al., 2007;Masson-Delmotte et al., 2011).Vegetation, in addition to being influenced by several aspects of seasonal climate, is directly influenced by the atmospheric CO 2 concentration (Prentice and Harrison, 2009).There are several approaches that have been adopted to overcome this type of problem: the use of multi-proxy reconstruction techniques, forward modelling of the system within a climate model or using climate model output (see an example related to coral carbonate isotopes in Sect.5.1) or other climate prior, and model inversion or data assimilation.Multi-proxy reconstructions rely on the idea that different types of record will be sensitive to different aspects of climate, and that pooling the information from each of these records therefore provides a more robust reconstruction of any specific climate variable.In the sense that forward modelling (and by extension model inversion techniques) are based on physical and or physiological knowledge of the given system, the use of these approaches may be a more robust way of dealing with the non-stationarity issue -however, as with climate models, the results are constrained by the quality of the models and the degree to which the system is well-understood (see for example the discussion of CO 2 fertilisation in Denman et al., 2007).
The scale over which a record is representative can be a major issue in comparing palaeo-data and model output.All types of records are responding to basically local conditions, though the scale over which the record is representative will depend greatly on the variable and the resolved timescale.Many records, such as tropical ice core δ 18 O, may have strong correlations with climate further afield (e.g.Schmidt et al., 2007).Comparisons at local or regional scales often require some form of dynamical or statistical downscaling of model output, though there are many associated issues with this (Wilby and Wigley, 1997).Alternatively, upscaling reconstructions (for instance, through the use of gridding) can often reveal large-scale patterns that models could be expected to resolve, although this requires a sufficiently dense network of sites (see Sect. 3 for examples).Other approaches include the use of cluster analysis to classify types of model behaviour and to determine cohesive regions for comparison with the large-scale patterns in the observations (e.g.Bonfils et al., 2004;Brewer et al., 2007).
Palaeo-climate reconstructions are usually accompanied by estimates of measurement or structural uncertainty.However, in practice these uncertainties have rarely been propagated into large-scale synthetic products (except in terms of non-quantitative quality control measures, see e.g.COHMAP Members, 1988) and even more rarely taken into account when the reconstructions were used for model evaluation.However, quantitative measures of uncertainty have been included in more recent palaeo-climate syntheses (e.g.MARGO Project Members, 2009;Bartlein et al., 2011) and the use of fuzzy-distance measures (Guiot et al., 1999;Harrison et al., 2013) provides an explicit way to take account of data uncertainties if these cannot be expressed with Euclidean distance.It is worth noting that model-data differences cannot be expected to be smaller than the data uncertainties themselves.

Modelling issues
There are two particular issues that are more problematic in palaeo-climate simulations than, for instance, simulations of the 20th century: model drift and forcing uncertainty.The issue of coupled climate model drift arises because of the long (∼ thousands of years) time required to bring the deep ocean into equilibrium in coupled ocean-atmosphere models.In some cases, insufficient spin-up time may have been allowed before specific experiments are started.While drift also affects transient historical simulations, the magnitude of the forcings in the 20th century means that residual drift is usually a small component of the transient response.smaller, and drift in the early centuries of the simulation will be a larger fraction of the modelled change (Osborn et al., 2006;Fernández-Donado et al., 2013).One proposal to deal with this is via a correction using the drift in the control simulation (i.e.calculating a smooth trend and removing it from the perturbed simulation prior to analysis).While this works well for temperature, it is not very good for variables that exhibit threshold behaviour such as sea ice extent or precipitation.In practice, this issue needs to be assessed for each proposed comparison.Second, there are important uncertainties in the forcings used for the palaeo-climate experiments.This is also true for aerosols in the historical simulations but such issues are more prevalent in palaeo-simulations.For example, the magnitudes of solar and volcanic forcing over the last millennium, and the size and height of ice sheets at the LGM are sources of major uncertainty.In the last millennium experiments, multiple forcing choices were proposed (Schmidt et al., 2011(Schmidt et al., , 2012)), but few groups have attempted (as yet) to comprehensively explore all the options, and this is also true for uncertainties associated with other time periods.If an insufficient range of different forcings is tested, it is plausible that mismatches between observations and simulations may be wrongly attributed to the model (or observations), when in fact they were related to a misspecified forcing (e.g.Kageyama et al., 2001).
Third, there are many aspects of past climate changes that are (currently) outside the scope of the available modelling within CMIP5 (and more widely).Variability in the last glacial period that involves complex ocean/ice sheet dynamics (such as Dansgaard-Oeschger events) are beyond what can be analysed directly since the CMIP5-class of models does not have sufficiently interactive dynamic ice sheets.There are also common biases across different models that have more to do with the state of computational technology than physics (for instance, poor or non-existent resolution of ocean eddies).Other examples can easily be found.
For clarity in the rest of the text, we define the term "ensemble" to denote the full multi-model database of results across all CMIP5 scenarios (which encompasses all palaeoclimate, historical, idealised and future projection simulations).The future projections used here consist of the four RCP scenarios (rcp26, rcp45, rcp6, rcp85) (future possibilities that correspond roughly to greenhouse gas radiative forcing at the year 2100, relative to the pre-industrial, of 2.6, 4.5, 6.0, and 8.5 W m −2 , respectively) along with idealised simulations that have been included to provide clean comparisons across models.The idealised simulations include a 1 % increasing CO 2 simulation, the response to an abrupt increase to 4xCO 2 , atmosphere-only simulations such as amip, amip4xCO2 and amipFuture (where all models are forced by the same pattern of ocean temperatures from the historical period, with 4xCO 2 , and with a warm anomaly imposed respectively), or sstClim and sstClim4xCO2 simulations (where ocean temperatures are held constant under pre-industrial or 4xCO 2 conditions).We use CMIP5 to refer to the entire database, including the PMIP3 simulations.Specific model simulations are referred to by their name in the CMIP5 database (i.e.rcp85, past1000, piControl etc.), while the scenarios or periods are referred to more generally using a standard abbreviation or name (e.g. the LGM, MH, RCP 4.5).We list the models that we have used in analyses in this paper, along with the specific experiments and simulation IDs, in Table 1.While the multi-model ensemble is a useful source for addressing structural uncertainty, it should be noted that the ensemble is not a controlled sample from a well-defined distribution of plausible simulations.

Approaches to comparing reconstructions and simulations
There has been a gradual evolution in the approaches for comparing reconstructed changes and climate model simulations from essentially qualitative graphical comparisons of output and reconstructions of the corresponding climatic variables (e.g.Braconnot et al., 2007) to more quantitative approaches that measure model-data mismatch via some "metric" or distance function (e.g.Sundberg et al., 2012;Izumi et al., 2013).Metrics based on correlations or rms differences between fields of data and model output have been commonly used in model evaluation for current climate (e.g. Taylor, 2001;Schmidt et al., 2006;Gleckler et al., 2008).These methods provide opportunities for both inter-and intra-generational model comparisons (Reichler and Kim, 2008;Harrison et al., 2013).The concept of "skill" as adopted in the numerical weather prediction community is useful as a quantitative test of model performance: that is, does a model produce a more accurate prediction (match to the palaeo-climate record), than that which would be achieved by a simple null hypothesis (Hargreaves et al., 2013)?Most studies and metrics have focused on time slice or time series comparisons, though it is worth pointing out that nothing precludes comparing the simulations and palaeo-record in the frequency domain (e.g.Lovejoy and Schertzer, 2012b).
While most standard comparisons focus on evaluating individual model simulations against the reconstructions, a different approach is to focus on the collective performance of the ensemble as a whole.For instance, Hargreaves et al. (2011) tested the ability of the PMIP2 ensemble to represent the Last Glacial Maximum in terms of its "reliability", defined as the adequacy of the ensemble, considered in probabilistic terms, in predicting the changes documented in the palaeo-climate archives during that interval.Multi-model ensemble means can be informative and will generally outperform individual models (Annan and Hargreaves, 2011), but care must be taken to assess the suitability of each included model and (any) weighting of individual models needs to be well justified (Knutti et al., 2010a

Linking past and future
The key task of this paper is to provide guidance and examples for deciding on whether the palaeo-climate simulations have a connection to the future projections, and if so, what the comparison to palaeo-reconstructions can imply for the future.We stress that robust links between past and future simulations can only be derived if the model configurations used are the same in the different experiments.A previously common practice of using a lower resolution or differently tuned or scoped model for past simulations than for future projections, while perhaps convenient for efficiency, is not appropriate because such variations often have lead to substantial differences in sensitivity.Thus, all the examples discussed below link models that were identical (excepting boundary conditions and forcings) in the past and future CMIP5 simulations.We distinguish two ways in which palaeo-data-model comparisons can be used as a guide to the future: (1) as a validation of a robust relationship between diagnostics across models and scenarios, or (2) as a method to discriminate between differently skillful models.In the first case, one would search for properties or correlations that we expect to be features of all climates within the ensemble, determine whether that is the case, and use the palaeo-data to provide some independent support for that relationship.In the second case, there is a prerequisite that for the diagnostic chosen, the "skill" metric when it is compared to a reconstruction actually correlates to future outcomes within the ensemble.If this is not the case, then the skill in that diagnostic is orthogonal to the spread in the projections and cannot be used to constrain them.Even when such a relationship is found, we need to consider whether it is physically meaningful to be confident that it has not arisen either though chance due to a small sample size or as an artifact of the model or the experimental design.To gain confidence in such palaeo-constraint, we also need to understand the physical processes that explain the connections between past and future.
While connections may in principle be highly complex, it is natural as a first step to consider whether a correlation exists between past and future behaviour in the same diagnostic.The search for useful metrics (in this sense) using modern data has generally been disappointing (Knutti et al., 2010b), although there have been a small number of cases where apparently meaningful relationships have been found (Boé et al., 2009;Hall and Qu, 2006;Brient and Bony, 2012;Fasullo and Trenberth, 2012).It is notable that the first three examples relate future climate changes to externally forced changes in the modern climate (decadal or seasonal variations), rather than using metrics based on the climatological mean state alone.This lends support to our working hypothesis that past variations seen in palaeo-climate simulations will be informative about the future.
Where a credible relationship between past and future is found, there is a range of methods that can be applied to use observations to constrain future predictions (Collins et al., 2012).One method, applied by both Boé et al. (2009) and Hall and Qu (2006), is to take the observational estimate, and use the relationship (often linear) embodied in the correlation between past and future model output to project this value into the future.An attractive feature of this approach, beyond its simplicity, is that it readily allows extrapolation of the observed relationship in the case where the true value is suspected of lying outside the model range.An alternative approach, which has been widely applied to perturbed physics ensembles, is more explicitly Bayesian and considers the ensemble as a probabilistic sample.For the prior, equal weight is typically assigned to each ensemble member.Probabilistic weights are then calculated for each member of the ensemble, according to their performance in reproducing the observations.This weighted ensemble now represents the posterior estimate of future change.This method uses the model spread as a prior constraint which, depending on one's viewpoint and the specific case in question, may be considered either a strength or weakness (Collins et al., 2012).

Robust relationships in past and future simulations
In this section we highlight examples of physically based correlations between key diagnostics that show similar relationships in the palaeo-climate simulations and in future projections (or the more idealised warming scenarios) and whose fidelity can be assessed using the palaeo-climate record.If these conditions are realised, the observations can be used to support the model results, and thus help provide contingent future predictions of one diagnostic given a potential change in the other.
An important issue for assessing future climate impacts is to what extent the large-scale mean temperature response can be used as an index for more regional changes.We consider the relationships between global mean temperature and temperature changes in the tropics and other regions in Sect.3.1, and relationships between land and ocean temperatures in Sect.3.2.

Relationships between regional and global temperature change
A common feature in future and palaeo-simulations is that some parts of the world warm or cool at different rates.In future climate simulations, the high latitudes warm more than the low latitudes, as is also observed during the recent instrumental era.This "polar amplification" is also present in LGM simulations and data, with a stronger cooling in the high latitudes than in the tropics (Masson-Delmotte et al., 2006a, b).Izumi et al. (2013) investigate high vs. low latitude temperature changes in lgm, midHolocene, historical, 1pctCO2 and abrupt4xCO2 PMIP3/CMIP5 simulations and find broadly consistent relationships for lgm, historical and increased GHG forcings, between mean annual SST changes, w.r.t to piControl, over the northern extratropics and the northern tropics.However, the relationship is not consistent for mean annual air temperatures in the lgm simulations compared to the others because of the particular impact of the northern ice sheets.Here we examine the relationships between changes in global mean temperature and change over large-scale regions.The uneven distribution of the palaeoclimatic reconstructions indeed suggests a focus on specific regions, rather than the globe.
The main climate forcings for the LGM are the lower concentrations in atmospheric greenhouse gases and the presence of Laurentide and Fennoscandian ice sheets in the northern extratropics.The ice sheets have a strong local albedo effect (e.g.Braconnot et al., 2012) but also affect the mid-latitude large-scale atmospheric circulation due to the associated change in topography (e.g.Pausata et al., 2011;Rivière et al., 2009;Laîné et al., 2009).However, away from the direct ice sheet perturbations, we expect that the greenhouse gas forcing would be the main forcing for the LGM climate change and thus patterns of response may be similar to future warmer climates (Hewitt and Mitchell, 1997).
We analyse the comparison between the mean annual surface air temperature change over a region compared to the global mean change for the abrupt4xCO2, 1pctCO2 and lgm CMIP5 simulations from the 8 models for which the results were available at the time of the analysis.We have considered the tropics (land + oceans) and the tropical oceans as targets, because they have been used previously in perturbed physics ensemble studies (Schneider von Deimling et al., 2006;Hargreaves et al., 2007), East Antarctica, for which the temperature change has been shown to scale with global temperature change for the LGM and the CMIP3 2xCO2 and 4xCO2 changes (Masson-Delmotte et al., 2006a, b) and the mid-latitude region of the North Atlantic and Europe.
Figure 1 shows a clear relationship between the tropical and global temperature change for the 1pctCO2 and abrupt4xCO2 anomalies, both for the combined land and ocean grid cells (top-left panel) and for ocean grid cells alone (bottom-left panel), and this relationship is consistent across these two experiments.The relationship for the LGM is ambiguous because the results for 7 out of the 8 models cluster around the same values.These appear to fall outside the relationship which can be derived from the 1pctCO2 and abrupt4xCO2 simulations, with a smaller LGM tropical temperature change for a given global temperature change.This may be because of an outsize influence of the LGM northern hemisphere ice sheets on the global mean for this particular climate.Furthermore, the models which simulate the smallest (largest) warming for increased CO 2 are not those which simulate the smallest (largest) cooling for LGM.This implies that either the impact from the lower GHG concentrations are not symmetric compared to those for increased GHG concentrations, or that the ice sheet remote impact extends to the tropics (as inferred by Laîné et al., 2009).The relationship appears more consistent across experiments for East Antarctica (Fig. 1, bottom-right panel) and, surprisingly given the proximity of the ice sheets, over the North Atlantic/Europe region (Fig. 1, top right panel).
In the second row of Fig. 1, we indicate the range of the reconstructed LGM regional response.In the case of the tropical oceans (bottom-left plot), this range is computed from the MARGO (2009) data.Uncertainties are derived using a bootstrap method, randomly drawing 1000 samples (of random size, and with replacement) from the initial MARGO data set.For each drawn site, we assume a Gaussian probability function centered on the mean reconstruction and with a standard deviation equal to the uncertainty given in the data set and we draw a possible value considering this probability distribution function.We obtain 1000 estimates of the mean value and compute its mean ±2 standard deviations, which defines the shaded blue band on the   Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics  Fig. 1.Model average regional vs. global temperature changes for the glacial (in blue, the pre-industrial -LGM difference is shown), years 120 to 140 of the 1pctCO2 simulations (in yellow, in comparison to piControl) and years 100 to 150 of the abrupt4xCO2 simulations (in red, in comparison to piControl).For the bottom plots, the regional model average is taken only over grid boxes that correspond to proxy data sites within the defined region (reconstruction range shown in blue shading).For the tropical oceans (bottom-left panel), the blue shaded band shows the average ±2 × standard deviations of the present -LGM warming as evaluated by the bootstrap method from the MARGO (2009) reconstructions, taking into account the uncertainty on the reconstructions (see main text).For East Antarctica, the blue shaded band corresponds to the range of available reconstructions (5 sites) ±1 • C (Braconnot et al., 2012).Definition of the regions: Tropics: The results have been computed for all models in the database on 23 July 2012 for which there were results for the lgm, piControl, 1pctCO2 and abrupt4xCO2 simulations.
For East Antarctica, we "only" have 5 points, so we simply consider the uncertainty of ±1 • C on the reconstruction (Masson-Delmotte et al., 2006) and the range of available reconstructions.In both cases, the available data discriminate between the models, with 2 models out of 8 falling in the range of the reconstructions for the tropical oceans and of 4 models out of 8 in the case of East Antarctica.In summary, the range of model results for increased CO 2 scenarios shows that there is a relationship between regional and global temperature changes for all regions considered here.The range of simulated LGM regional/global average temperature change is smaller than in the increased CO 2 runs.The results are consistent with the relationship derived in future scenarios for East Antarctica and the North Atlantic/Europe region.For the tropics, the LGM ratio is smaller than that seen in future scenarios, which could be due to the impact of ice sheets on the global mean temperature change.Both data and models suggest an amplification of changes from the tropics to Antarctica and the data can help constrain the global LGM temperature change to 4.2 to 5 • C, but only weakly constrain the expected sensitivity to abrupt4xCO2 forcing (from 4.2 to 6.5 • C).Additional sensitivity experiments will be needed to test the individual impacts of CO 2 and ice sheets and better understand the full LGM response and the inter-model differences.These results are based on only 8 models and will need to be revisited when a larger number of simulations are available.

Land-ocean contrasts
Even though models show biases in the LGM when directly compared to reconstructions (Fig. 1) there are large-scale relationships which appear to be consistent for different climates.For instance, model results have consistently shown that for the LGM, the continents cooled more than the ocean (e.g.Braconnot et al., 2007Braconnot et al., , 2012;;Laîné et al., 2009), while, in a symmetric manner, predictions for future climate show a stronger warming over land than over the oceans (e.g.Sutton et al., 2007;Drost et al., 2012).The ratio between cooling over non-glaciated land and cooling over the ocean for the LGM tropics was ∼ 1.3 in the PMIP1 computed sea surface temperature (SST) simulations (Pinot et al., 1999), a result close to the ratio of ∼ 1.5 found in both the PMIP2 fully coupled LGM experiments (Braconnot et al., 2012) and CMIP3 future projections (Sutton et al., 2007).Izumi et al. (2013) evaluated this land-sea ratio from the CMIP5 lgm, piControl, historical, 1pctCO2 and abrupt4xCO2 simulations and found consistent land-sea ratios for global changes and for  Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics Tropics  LGM -piControl in blue, 1pctCO2 -piControl in orange, abrupt4xCO2 -piControl in red.For the latter 2 periods, the averages have been computed over the same years as Fig. 1.The results have been computed for all models in the database on 23 July 2012 for which there were results for the lgm, piControl, 1pctCO2 and abrupt4xCO2.The grey lines indicate the 1 : 1.5 ratio in both plots.The results from the reconstructions are based on the MARGO (2009) data for the oceans and on the Bartlein et al. (2011) data for the continents.The error bars show twice the standard deviation of the distribution of the mean temperature changes over the region, as estimated by the same bootstrap method as described for the bottom-left plot of Fig. 1 and in Sect.3.1.Right panels: plots show the land-sea ratio computed from the data, again with the same bootstrap method.The horizontal bar shows the median ratio, the thick vertical line shows the 25-75th percentiles and the thin lines the 10-90th percentiles.The model results are computed by a bootstrap method applied on all model results, as explained in the text.The definition of the thin/thick vertical lines and horizontal bar are the same as for the data.
the northern and southern extratropics as well as for the tropics, though the ratio varies with latitude and is smallest in the tropics.
Figure 2 shows temperature changes over land vs. oceans for the tropics and North Atlantic/Europe.As also shown by Izumi et al. (2013), the relationship between temperature changes over land and over the oceans appears to be broadly consistent for the lgm, 1pctCO2 and abrupt4xCO2 results, though the degree of agreement in the land-sea contrast varies across regions.Furthermore, even though models appear to overestimate temperature changes both over land and oceans in the tropics, and to underestimate them in the North Atlantic/Europe region, the land-ocean ratio appears to be consistent with the data.The right-hand panels in Fig. 2 were built by estimating the distribution of the ratios between mean temperature changes over land and over the oceans using a bootstrap method on the data from Bartlein et al. (2011) for continental data and MARGO (2009) for the ocean data and taking their uncertainties into account (as for Fig. 1).A similar approach is used for the models, selecting only from points where target reconstructions exists.Coefficients are taken from a linear regression constrained to pass through the origin from 1000 trials.The results show that for the tropics, the land-sea ratio is consistent for the different periods, and model-derived ratios are themselves consistent with the reconstructions.This is also true for the North Atlantic/Europe region, although in this case, the LGM results are more offset from the increased CO 2 results.We conclude that these relationships are robust, although the reasons for this appear to be imperfectly understood (Lambert et al., 2011) and will require, as for the results from Sect.3.1, additional sensitivity and process-based analyses.Incidentally, it is worthwhile to note that the land-ocean relationship was previously used to highlight the inconsistency between an earlier compilation of tropical LGM sea surface temperatures and adjacent continental reconstructions (Rind and Peteet, 1985).

Palaeo-derived measures of skill that discriminate between models
In this section we highlight diagnostics for which we have commensurate palaeo-climate information and for which the skill metrics across the ensemble serve to discriminate between models that show different behaviours in future projections.This requires that we demonstrate that differences in future sensitivity are correlated to past sensitivities, and that palaeo-reconstructions exist that can effectively weight the projections from models with more realistic sensitivity in the past more highly in an ensemble projection.We illustrate this with three examples: in Sect.4.1, we look at a simple binary grouping of model behaviour related to South American rainfall that can be evaluated using information from the mid-Holocene.Section 4.2 revisits attempts to constrain overall climate sensitivity using information from the LGM, and Sect.4.3 looks at the potential to estimate sea ice sensitivity to Arctic warming through results from the mid-Holocene.

Rainfall change in South America
Projections of precipitation change in South America have a large spread in the CMIP3 (Meehl et al., 2007) and CMIP5 (Knutti and Sedláček, 2012) archives.In future projections, most models simulate a dipole of precipitation change in northern South America.However, the sign and magnitude of this dipole depends on the model: some models simulate drier conditions in Guyana, Venezuela and Colombia and wetter conditions in Nordeste and eastern Brazil, while some model simulate the opposite changes (Fig. 3).
We define the precipitation dipole as the annual-mean precipitation averaged over 0-8 • N; 50-60 • W (hereafter "Guyana") minus the annual-mean precipitation averaged over 5-15 • S, 35-45 • W (hereafter "Nordeste").We divide 28 different models from the CMIP5 archive into two equal groups.Models where the dipole is weak or negative in the changes in precipitation between rcp85 and piControl are placed in group 1; models which have a strong positive dipole are in group 2. All of the models simulate similar patterns of present-day precipitation, although models in group 2 tend to have a more pronounced double ITCZ.Among the models, midHolocene output was available for 7 models in group 1 and for 5 models in group 2.
Figure 3 shows a link between precipitation change in the future and in the MH.Models in group 1 simulate wetter conditions in Guyana and drier conditions in Nordeste, associated with a northward shift of the ITCZ in the rcp85 and a broadening of the ITCZ in the MH simulations.Conversely, group 2 models simulate drier conditions in "Guyana" and wetter conditions in "Nordeste", associated with a southward shift of the Intertropical Convergence Zone (ITCZ).They show a similar dipole in the MH, with a strong southward shift of the ITCZ.Thus the models from a particular group show essentially the same change in the dipole pattern and the same shift in the ITCZ in both future and MH simulations.These patterns are robust relative to the numbers of groups or the number of models included in any group.Palaeo-data from South America show drying everywhere except northeastern Brazil (Prado et at., 2013), a response which is more consistent with group 1 than group 2.
The processes underlying these patterns can be investigated using a variety of other CMIP5 simulations.Table 2 shows correlations between precipitation changes and other features of the simulations.Shifts in the ITCZ in the future projections are associated with shifts in the SST dipole in the Atlantic: models that shift the ITCZ the furthest southwards are those with the strongest warming south of the Equator relative to the rest of the Atlantic.However, while ITCZ shifts in response to SST dipoles are expected (e.g.Kang et al., 2008), this is not the dominant pattern for the MH to PI change.Some of the model behaviours are seen in the amipFuture and sstClim4xCO2 simulations, indicating that the intrinsic response of the atmosphere to a given SST change plays a key role in the formation of the dipole.This is consistent with the fast atmospheric response to CO 2 being an important component of the total precipitation response in global warming (e.g.Bala et al., 2010;Bony et al., 2013).Models that have reduced precipitation over northern South America in the MH simulations also have reduced precipitation in the projections and under 4xCO2.These models have the strongest land surface warming in response to both 4xCO2 and MH forcing.Although the precipitation response of the different groups of models to a change in forcing differs, within each model group the response to different forcing (SST changes, orbital forcing, 4xCO2) is similar.This suggests that common mechanisms are involved in the precipitation response to all forcings, and that we can expect future changes to resemble those predicted by the group 1 models.A more quantitative assessment of these changes still remains to be finalised.

LGM constraints on climate sensitivity
The LGM has been a prime target for assessments of climate sensitivity since it is a quasi-stable period with significant climate differences from today, with reasonably The precipitation dipole is defined as the difference of precipitation change in RCP 8.5 between the "Guyana" region and the "Nordeste" region.Only those models within each group that had both rcp85 and midHolocene data available at the time of the analysis are plotted.Other models that provided only rcp85 data are listed for completeness, but without any markers.(b) Maps of precipitation changes from piControl to rcp85 (top panels) and from piControl to midHolocene (bottom panels) in average over all available models in group 1 (left panels) and in group 2 (right panels).Contours show corresponding SST changes.The boxes over land and ocean show the areas used in the dipole definitions.
well-known boundary conditions and sufficient data to reconstruct large-scale climate shifts (e.g.Lorius et al., 1990;Edwards et al., 2007;Köhler et al., 2010;Schmittner et al., 2011;PALAEOSENS Project Members, 2012).This provides a good opportunity to apply the methods described in Sect. 2 as a proof-of-concept estimate of the equilibrium climate sensitivity based on the CMIP5 LGM simulations.
We use an ensemble of opportunity consisting of 7 models which participated in the PMIP2 experiment, together with 7 CMIP5 models for which sufficient data were available (at time of writing).Estimates of the climate sensitivities of these models were obtained from a variety of sources and were derived using a range of methods: For the PMIP2/CMIP3 models, sensitivity was generally calculated using a slab ocean coupled to the atmospheric component   (Meehl et al., 2007), whereas in CMIP5, the most readily available estimates use a regression based on a transient simulation (Andrews et al., 2012).These estimates are not perfectly commensurate, with some models reporting a 10 % difference in the two methods (e.g.Schmidt et al., 2014).Unfortunately, some of the PMIP2 models used for the LGM simulations differ from the CMIP3 versions for which the sensitivity estimates were made (for example, MIROC3.2).Thus, while the values used here may be somewhat inconsistent and imprecise, we expect the uncertainty arising from these sources (around 0.5 • C) to be modest in comparison to the range of values represented across the ensemble (roughly 2-5 • C).The boundary conditions for the LGM simulations are essentially unchanged between PMIP2 and CMIP5 (save for changes in the shape of the imposed ice sheets), allowing us to consider these experiments as broadly equivalent though there are some systematic biases due to the total ice volume and resulting changes in land/sea mask (Kageyama et al., 2013).Limitations in the boundary conditions (such as the exclusion of dust and vegetation effects) which we do not attempt to account for here, could introduce additional bias and uncertainty into our result.For these and other reasons discussed below, these results should be considered as a proof of concept rather than conclusive.
The LGM was associated with a large negative radiative forcing with respect to the pre-industrial including substantially lower concentrations of greenhouse gases (e.g.Köhler et al., 2010).However, the ensemble does not show the expected negative correlation between climate sensitivities and their globally averaged LGM temperature anomalies (over the full 100 yr of simulation output) (Fig. 4a, see also Crucifix, 2006).Hargreaves et al. (2012) analysed the PMIP2 ensemble on a regional basis and found their LGM temperature changes in the tropics to exhibit a negative correlation with climate sensitivity, most strongly in the latitude band 20 • S-30 • N. Results from the PMIP3 models are consistent with this relationship, but do not strengthen it.When we combine all models into one ensemble, the correlation over this region weakens to −0.54 but it is significant at the 95 % level.
The correlation is generally insignificant at higher latitudes where the feedbacks in response to large cryospheric changes may be very different to those exhibited in a future warmer climate.There is also a strong positive correlation in the southern ocean (i.e.colder LGM anomalies are linked with lower sensitivity), possibly due to the large range of biases in the control climate (Fig. 4c).The correlation of piControl temperatures to sensitivity points to the Arctic and the southern oceans as regions where base climatology strongly impacts sensitivity, probably via cloud effects (see Trenberth and Fasullo, 2010 for a discussion).The significant negative correlation between the LGM temperature anomalies in the latitude band 20 • S-30 • N, and the climate sensitivities of the models (Fig. 5), is physically plausible, since this region is far from the cryospheric and sea ice changes of the LGM, and the forcing here is dominated by the reduction in greenhouse gas concentrations.Assuming that the correlation with tropical temperatures provides a valid constraint on the real climate system, we can use this correlation to project a reconstruction of past change onto the future, as in Boé et al. (2009).Annan and Hargreaves (2013) generated a new estimate of LGM temperature changes, based on a combination of several multiproxy data sets, and the ensemble of PMIP2 models.The method does not depend on the magnitude of changes estimated by the models, but only their spatial patterns.Due to the suspicion that the tropical temperatures at the LGM from the MARGO synthesis are too warm (e.g.Telford et al., 2013), we focus here on the results from the sensitivity test in Annan and Hargreaves (2013), where reconstructed tropical temperatures were decreased uniformly by 1  q q q q q q q q q q q q q q −5 −4 −3 −2 −1 0 0 1 2 3 4 5 6

Using PMIP2 and PMIP3 models as a constraint on climate sensitivity
LGM Tropical temperature change (20S−30N) Equilibrium sensitivity q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Observed value −2.2 −2.9 −1.5 Estimate 1.6 3.1 4.5 temperature change in the 20 • S-30 • N latitude band of −2.2 ± 0.7 • C (at 90 % confidence), the predicted value for climate sensitivity arising from the correlation is 3.1 • C, with a 90 % interval of 1.6-4.5 • C calculated by Monte Carlo sampling, but this range is sensitive (by up to 0.4 • C) to the reconstruction uncertainties (Annan and Hargreaves, 2013).
In a more explicitly Bayesian approach, we can initially assign equal probability to each model in the ensemble.This choice can be questioned, given both the range of model complexities, and also the possible inter-or intragenerational similarities between models of related origins (Masson and Knutti, 2011).However, quantifying these

Climate sensitivity estimated by weighting of the PMIP2 and PMIP3 models
Climate Sensitivity Probability q q q q q q q q q q q q q q 1.7 4.9 1.5 4.7 issues is far from straightforward, so we make our choice for reasons of practicality and in order to demonstrate the utility of the overall method.A standard kernel density estimation based on the ensemble leads to the prior distribution presented as the green curve in Fig. 6, which has a 90 % range of 1.7-4.9• C and a median of 3.3 the reconstructed observations, M the model simulation, L(M|O) the likelihood of the model result given the reconstructed data, and P (O|M) the probability of the reconstructions assuming the models are correct).The posterior distribution is shown in red, the bulk of which has been shifted to lower values with the median reducing to 3.0 • C. Its 90 % probability range has moved slightly less to 1.5-4.7 • C. The reason for the upper limit here remaining high is that the highest sensitivity model in the ensemble has been assigned a fairly large weight since it matches the reconstruction well.
The small size of the ensemble means that this approach is rather sensitive to the presence or absence of particular models in the ensemble.These two approaches differ considerably in their use of the model ensemble.In the latter case, the ensemble is directly used as a prior estimate, which therefore imposes quite a strong constraint on climate sensitivity even before the observational constraints are used.The former method may be considered as roughly equivalent to using a prior that is uniform in the observed variable (here tropical temperature), although this approach is rarely presented in explicitly Bayesian terms.Despite the different assumptions and approaches, these methods both generate similar estimates for the climate sensitivity -both assigning higher probability towards the lower end of the model range.The ranges are comparable with other palaeo-climate-derived estimates of 2.3-4.8• C (68 % confidence interval, PALAEOSENS Project Members, 2012) but, given the small ensemble size and possible naïvety of the assumptions made here, these estimates may not be robust and need to be tested using a larger ensemble.

Arctic Sea ice sensitivity constraints from the mid-Holocene
The rate and pattern of Arctic sea ice change in future decades is of interest due both to the surprisingly rapid changes currently occurring and the large spread in model estimates in, for instance, the onset of summertime "ice-free" conditions (Stroeve et al., 2012;Massonet et al., 2012).Recent studies (Mahlstein and Knutti, 2012;Abe et al., 2011) have demonstrated that biases in sea ice volume have a strong impact on the simulated responses to radiative perturbations, and that there may be a possibility to discriminate among models based on interannual modes of sea ice variability.The mid-Holocene simulations (driven mainly by changes in orbital forcing) provide an orthogonal test of Arctic sea ice sensitivity since MH insolation changes imply that NH summers were warmer than summers today (see Kutzbach, 1981 and many subsequent papers).Palaeo-data from the circum-Arctic region indicates that this warmth was accompanied by reductions in sea ice extent at least during some months of the year (Dyke and Savelle, 2001;de Vernal et al., 2005;McKay et al., 2008;Funder et al., 2011;Polyak et al., 2010;Moros et al., 2006).The CMIP5 MH simulations (Fig. 7c) consistently show decreases in sea ice extent from July/August through to November relative to the pre-industrial.Changes in winter months (December-February) do not agree in sign across the models, though these changes are not well characterised in the palaeo-data either.There is a relationship (Fig. 8) between the size of the anomaly at the MH and in future projections (using 2036-2065 in rcp85), presumably reflecting the underlying sensitivity of the sea ice model and Arctic climate in general (see also O'ishi and Abe-Ouchi, 2011).We focus on the 30 yr period centered on 2050 since that is when there is a very large spread in individual model projections in the RCP 8.5 simulations (Fig. 7).This correlation exists despite the variations in the cause of the ice loss (summer insolation versus greenhouse-gasrelated forcing).Although the small size of the ensemble raises questions about the robustness of the relationships, the MH ice extent anomaly can be used to estimate the likely loss in future projections.Given the qualitative nature of the palaeo-data, we are not yet able to make a quantitative projection, but there is some support given to the models with a larger projected change.A more detailed approach using more specific and local diagnostics in comparison to a wider proxy network will likely give a more quantitative result (Tremblay et al., 2014).

Exploratory metrics: potential and limitations
While the examples given above show direct connections between past and future in ways that can be used relatively straightforwardly, there are a number of reasons why other diagnostics may not be as useful.In this section we provide examples of where the palaeo-climate information has yet to be explored, is ambiguous, or where connections seen in palaeo-climate changes do not translate easily into the future for some reason.This may be related to forcing ambiguities, climate-change-related non-stationarity in climate/proxy relationships, or potentially, a poor understanding or representation of the dominant processes.While these examples are not directly informative about the future, they illustrate how the palaeo-simulations can be explored in ways that illuminate key uncertainties and, potentially, provide more opportunities in the future.
Section 5.1 focuses on the potential for shorter-term extremes in temperature and precipitation at the regional scale to be predicted by large-scale seasonal anomalies.Section 5.2 deals with the issue of the evaluation of models over the historical period in the tropical Pacific using forward models of coral-based proxies.Section 5.3 addresses diagnostics in the frequency domain that are strongly affected by uncertainties in the forcing fields, rather than intrinsic properties of the models.Finally, Sect.5.4 provides an example of how connections between past and future hydroclimate diagnostics may be non-stationary.

Regional extremes
Extreme climate events such as heat waves and cold spells can have long-lasting impacts on society or ecosystems (IPCC SREX, 2012) and there have been analyses of the impact of heat waves during recent centuries in Europe (Le Roy Ladurie, 2004Ladurie, , 2006;;Barriendos and Rodrigo, 2006;Camuffo et al., 2010).The development of such events spans days to a few weeks, so that they are largely intra-seasonal by nature (Seneviratne et al., 2012).In such a context, the generally linear relationship between palaeo-climate reconstructions and actual climate can be strongly distorted.Since extreme events are by definition rare, large numbers of examples are required to get good statistics.Simulations of the past millennium offer a promising tool to investigate modelled extremes since they sample a longer time series and bigger range of possible cases than in most other simulations.The strongest limitation for an application of this method to palaeoclimatic data has been the necessity of dealing with daily data in order to resolve extreme value distributions (which may be non-Gaussian) and the need for palaeo-archives that record extreme variables (e.g.Donnelly and Woodruff, 2007).However, if we can demonstrate the robustness of the relationships between short and longer-term statistics over long periods of time, and/or their dependence on external forcings, we can potentially constrain the behaviour of temperature extremes in the future.
The statistical analyses of (daily) temperature hot extremes of the 20th century have shown that temperature is generally a bounded variable, for which the upper bound can be computed from the statistical parameters of extremes (Parey et al., 2010a, b).Diagnostic studies focusing on the probability distribution of temperature and precipitation extremes are often based on the application of Extreme Value Theory (EVT), though simpler metrics have also been used (e.g.Hansen et al., 2010).EVT describes the behaviour of the probability distribution near the tails, and allows one to estimate return periods for extremes that are longer than the period of observation (Coles, 2001).It has been applied to meteorological observations (Parey et al., 2010a), reanalysis data (Nogaj et al., 2006) and model simulations (Kharin et al., 2005(Kharin et al., , 2007) ) in order to estimate trends in extremes.
Extremes of hot and cold temperatures are correlated with mean temperatures over the northern extra-tropics (Yiou et al., 2009).Since very few models have archived daily outputs of temperature or precipitation on multi-century timescales, there has been no assessment of whether this is true over the longer term (Jansen et al., 2007) Fig. 9. Illustration of quantile regressions between the percentage of summer hot days (i.e.exceeding the 90th quantile of daily mean temperature in June-July-August) in western Europe (10 • W-30 • E; 36-61 • N) and the precipitation frequency anomaly with respect to the 1948-2011 mean in winter-spring (January to May).The precipitation frequency is computed over southern Europe (10 • W-30 • E; 36-46 • N) and is defined as the percentage of days with precipitation exceeding 0.5 mm.The quantile regressions are computed for the 10th and 90th quantiles of the hot day frequency, following Quesada et al. (2012).The red lines are for the 90th quantile regressions and the blue lines are for the 10th quantile regression.(a) shows the quantile regression for western Europe from the updated EOBS data set (Haylock et al., 2008) between 1950 and 2012 where each point represents a year.(b) is for the "historical" simulation  of the IPSL-CM5A-MR model (Dufresne et al., 2013).Both panels show a widening of the quantile regression for low values of precipitation frequency, indicating a consistency of the model simulation with observations.data was requested for simulations in the CMIP5 archive (Yiou et al., 2012).
Summer heat waves are generally preceded by droughts in the winter and spring in extratropical regions (Fischer et al., 2007;Vautard et al., 2007).The mechanism involves a positive feedback between sensible heat fluxes, evapotranspiration and temperature (Schär et al., 1999), and this has also been found in global and regional simulations of the future (Seneviratne et al., 2006(Seneviratne et al., , 2010;;Quesada et al., 2012).Quantile regression provides a useful statistic metric to investigate the linkage between precipitation in winter and spring and summer temperature.Ordinary least-squares regression focuses on the mean values of related variables, but by setting a threshold based on the upper/lower quantiles of the variable to be predicted, regression coefficients related to the high (or low) values of this variable are obtained (Koenker, 2005).The purpose of quantile regression is to investigate the conditional dependence between variables: for instance, the dependence structure could be different for small and large predictors.Hence, differences of slopes for small and high quantiles show that the relation between the predictand and predictor depends on the value of the predictor.An interesting feature is that quantile regression is not very sensitive to outliers, because the regression is performed on the ranks rather than the values themselves.We illustrate this diagnostic in Fig. 9, by computing the quantile regression for 90th and 10th deciles of the summer hot day frequency and winter-spring precipitation frequency anomaly in the IPSL-CM5A-MR historical simulation and the E-OBS gridded data set (Haylock et al., 2008).
As in Quesada et al. (2012), the frequency of hot days is defined by the percentage of days in western Europe (10 • W-30 • E; 36-61 • N) between June and August whose temperature anomaly exceeds the 90th quantile over a reference period (1948-2011).The frequency of rainy days is the percentage of days in southwestern Europe (10 • W-30 • E; 36-46 • N) between January and May whose precipitation exceeds 0.5 mm.This is a simplistic index for soil moisture (or drought) but it does have a significant predictive skill to European summer temperature variations (Vautard et al., 2007).More sophisticated indices of drought or soil moisture marginally improve the predictive skill (Seneviratne and Koster, 2012).In Fig. 9 the quantile regression slopes illustrate the asymmetry of the precipitation or temperature dependence for hot or cool summers in western Europe (Hirschi et al., 2011;Mueller and Seneviratne, 2012;Quesada et al., 2012;Seneviratne and Koster, 2012).
The 90th and 10th quantile regression lines are not parallel.Thus while the general picture is that a dry winter/spring tends to favor a hot summer and wet winter-spring conditions are generally followed by cool summers, dry winterspring conditions can be followed by cool summers as well as heat waves (large spread between low and high quantiles).This is due to the fact that the genesis of heat waves can be broken in just a few days, due to fast variations of the synoptic atmospheric circulation (Hirschi et al., 2011 et al., 2012).This feature has been tested on CMIP3 and some CMIP5 simulations for the present and future scenarios and shows that the seasonal predictability of large European hot summers decreases under drier conditions in southern Europe, although their frequency increases (Quesada et al., 2012).By looking at the last millennium simulations we will be able to examine the stability in time of these patterns, and hence potentially constrain future changes.

20th-century changes in tropical Pacific climate
The response of the tropical Pacific Ocean to anthropogenic climate change is uncertain, partly because we do not have a good understanding of how the region has responded to drivers in the past.Instrumentally based estimates of SST over the 20th century are not internally consistent (Deser et al., 2010), and model simulations have a wide spread of 20th-century trends (Thompson et al., 2011).Trends in the tropical Pacific are particularly challenging because the instrumental record is sparse even for the early 20th century and long-term in situ measurements of SST are uncommon.High-resolution palaeo-climate records, particularly the large network of tropical Pacific coral δ 18 O calcite records, can be used to extend the observational record and assess longterm trends.These δ 18 O calcite records respond to the combined effects of SST and the isotopic composition of seawater (δ 18 O sw ) (which is strongly correlated to sea surface salinity, SSS) and can reveal changes on longer timescales.
To address the limitations of historical observations, model simulations and coral proxy records in the tropical Pacific, Thompson et al. (2011) used a forward-modelling approach to generate synthetic coral records (i.e.pseudocorals) from observational and climate model output and test whether these pseudo-corals are in agreement with the network of coral δ 18 O c observations.The forward model for δ 18 O c calculates isotopic variations as a function of SST and SSS anomalies, with an SST-δ 18 O c slope of −0.22 ‰ • C −1 and the SSS-δ 18 O sw slope varying by region (LeGrande and Schmidt, 2006).When driven with historical SST and SSS data, the simple model of δ 18 O c is able to capture the spatial and temporal pattern of ENSO and the linear trend observed in 23 Indo-Pacific coral records between 1958 and 1990 (Thompson et al., 2011).The observed trends were driven primarily by warming at the coral sites, though SSS was responsible for approximately 40 % of the shared δ 18 O c trend.However, pseudo-coral records calculated from CMIP3 and CMIP5 historical simulations could not reproduce the magnitude of the secular trend (Fig. 10, upper panel), the change in mean state, or the change in ENSO-related variance observed in the coral network from 1890 to 1990.While the observational coral network suggests a reduction in ENSOrelated variance and an El Niño-like trend over the 20th century, CMIP3 and CMIP5 simulations vary greatly on both points.(Carton and Giese, 2008;Giese and Ray, 2011;Compo et al., 2011), a 500 yr control run from GFDL CM2.1 (Wittenberg, 2009), and the CMIP3 and CMIP5 multimodel ensembles.In each case, δ 18 O c was modelled from SST and SSS (1), SST only (2), and SSS (3).Lower Panel: magnitude of the δ 18 O c trend (‰/decade, computed from a simple linear regression through the trend PC) over 1890-1990 in pseudocorals modelled from CMIP5 historical simulations and over 2006-2100 in the RCP 4.5 projections where numbers in parenthesis indicate the number of runs in the historical and RCP 4.5 ensemble, respectively.The differences between observed and GCM-derived δ 18 O c trends may stem from the simplicity of the forward model for δ 18 O c , bias in the coral records, and/or errors in the GCM SST and SSS responses, or indicate an important role for unforced variability.Isotope-enabled coupled control simulations highlight uncertainties in the SSSδ 18 O sw relationship and suggest that short-term isotope variability may play a minor role (Russon et al., 2013;Thompson et al., 2013) biases in simulated salinity fields as a source of the discrepancy (Thompson et al., 2011;2013).For example, CMIP3 and CMIP5 simulations display weak and spatially heterogeneous SSS trends, such that the magnitude of the δ 18 O c trend in pseudo-corals simulated from CMIP3 and CMIP5 SSS is indistinguishable from the trends observed in individual centuries of an unforced control run (Fig. 10, upper panel).Further, trends in mean state and ENSO-related variance within the basin are highly variable among the CMIP5 models, and even between ensemble members of the same model, and much of this model spread may be attributed to differences in the simulated SSS fields.On the other hand, while pseudocorals, modelled from the new SODA 20th-century reanalysis of SST and SSS, display greater agreement with the observed coral trends, two recent versions of this product disagree regarding the relative contribution of SST and SSS.These results suggest that more work is needed to constrain the magnitude of the observed 20th-century salinity trend throughout the tropical Pacific Ocean.This work provides an example of the utility of forward models in investigating potential biases in both the models and proxy data, which may be used for further model development and exploration and improvement of model metrics.
Despite the disagreement among models and runs regarding the change over the 20th century, the CMIP5 projections converge upon a more El Niño-like (e.g.warmer eastern equatorial Pacific) mean state change by 2100 under RCP 4.5 (with only one model suggesting the opposite), consistent with the CMIP3 projections (Meehl et al., 2007).However, the models still disagree about the change in ENSO-related variance.Further, there is no clear relationship between the magnitude of the simulated 20th-century δ 18 O c trend and the projected future δ 18 O c trend in the CMIP5 ensemble (Fig. 10, lower panel).This suggests that an agreement of the simulated 20th-century change in the tropical Pacific with that of the observational coral network would not be a reliable indicator of future trends.Nonetheless, this work highlights key uncertainties in the observed and simulated salinity trends within the basin and thus provides a basis for further development of the models and this potential metric.More generally, it shows the utility of a forward modelling approach in palaeo-model/data comparisons to highlight key functional dependencies in specific proxies and investigate potential biases in both models and reconstructions.

Decadal to multi-decadal variability
In contrast to the spatial domain used in other examples here, this section highlights two analyses in the frequency domain that illustrate the important role of relatively uncertain forcings in assessing skill in model simulations of decadal to multi-decadal variability.Given the short instrumental period, it might be hoped that longer time series from proxy reconstructions for the last millennium could be used to  forcing is used at all is the impact of different solar forcing reconstructions detectable.Spectra derived using MEM with 30 poles, from 850 to 2005, after correction for control run drift using a loess low-frequency estimate derived from the control run.Key abbreviations: Land use: Pnz (Pongratz et al., 2008), Kap (Kaplan et al., 2011); Solar: Vra (Vieira et al., 2011), Stn (Steinhilber et al., 2009); Volcanic: 2xGao (twice the forcing from Gao et al., 2008), Crw (Crowley and Unterman, 2013).
constrain internal variability, and hence the unforced spread in projections over the next few decades.In Fig. 11, we show the maximum-entropy method (MEM) spectra (using 30 poles) for the NH mean land surface temperature over 8 last millennium simulations (850-2005) with the GISS-E2-R model that were run with different combinations of plausible solar, volcanic and land use forcings (Schmidt et al., 2011(Schmidt et al., , 2012)).The spectra are similar for models that have the same volcanic forcing, and significantly different when the volcanic forcing is derived from a different data set or where no volcanic forcing was imposed at all.Specifically, interannual to multi-decadal variability is much larger when volcanoes are imposed, and the larger the volcanic forcing, the greater the variability, with the largest response in simulations using the Gao et al. (2008)  simulations was misspecified and gave roughly twice the expected radiative forcing.However, given the uncertainties in specifying volcanic forcing (for instance, associated with the effective radius of the particles), the exercise is nonetheless useful in highlighting the role of forcings in determining variance.In contrast, the difference between two different solar forcings (Vieira et al., 2011;Steinhilber et al., 2009) is not detectable in this metric.
The no-volcano simulations underestimate the decadal/multi-decadal variance seen in two of the three reconstructions, while the with-volcano simulations overestimate it.The lowest-frequency bands in the models (primarily driven by orbital forcing, and the 20th century anthropogenic trend) have slightly larger variance than in the reconstructions.
Another analysis of variability as a function of timescale is one focused on power law scaling (Lovejoy and Schertzer, 1986).Several scaling studies of GCMs demonstrate that they generally simulate the statistics (including spectral scaling exponents) reasonably well up to ≈ 10 yr scales (e.g.Fraedrich and Blender, 2003;Zhu et al., 2006;Rybski et al., 2008;Lovejoy and Schertzer, 2013;Vyushin et al., 2012).However, tests at lower frequencies are strongly affected by the solar and volcanic forcings as well as the possible impacts of slow processes such as deep ocean or land-ice dynamics which are perhaps poorly represented or missing.
Following Lovejoy and Schertzer (2012a), we calculate the root mean square (rms) fluctuation as a function of timescale, from months to centuries, for the NH land temperatures using the same eight runs of the GISS-E2-R model used above, for the period 1500-1900 CE (Fig. 12).Since simulations are strongly clustered according to changes in the volcanic forcing used (Fig. 11), for simplicity we averaged over the three Gao and three Crw volcanic and the two no-volcanic runs.For comparison, we show the mean of the same metric from three multiproxy reconstructions (Huang et al., 2000;Moberg et al., 2005;Ljundqvist, 2010).The multiproxy average is processed with and without the 20th century to indicate the importance of that period for the scaling behaviour -in all cases the variance in the multidecadal to century scale is greatly enhanced by the recent anthropogenic trend.These curves show fluctuations stable with scale over the low frequency weather regime (years to decades) but increasing in the climate regime (decades to centuries) (Fig. 12).
The comparison with the GISS-E2-R simulations is illuminating.First, we note that the slopes for the simulations show decreasing variance from annual to centennial scale, in contrast to the reconstructions.Only the volcano-free runs (bottom) clear have increasing variance with scale in the centennial and longer periods, though with a magnitude of variance at all scales that is too low.Volcanic forcings add variance at all scales, but producing larger magnitudes that inferred from the reconstructions.Both these results demonstrate clear mismatches in behaviour between the models' simulated variance at different scales and the inferred variability from multi-proxy reconstructions.However, there are strong sensitivities to the (uncertain) external forcing functions, precluding a straightforward attribution of the mismatch to potentially misspecified forcings, missing mechanisms, insufficient "slow" variability or problems in the reconstructions.Specifically, reconstructions may have frequency-dependent biases that vary depending on the methodology and source data.For instance, boreholes used in Huang et al. (2000) do not have high frequency variability, while low frequency variability in treering-based reconstructions is hard to capture.In models, the importance of decadal and multi-decadal variance in the Pacific or Atlantic sectors vary widely and are poorly constrained from observations, and there may be significant issues with the forcing functions themselves.Other analyses (i.e.Schurer et al., 2013) have examined the coherence of last millennium simulations and the proxy reconstructions and found that while signatures of multiple forcings can be determined, there is a mismatch in the magnitude of the response to volcanoes coherent with the conclusions drawn above.It therefore remains unclear what implications these tests have for future projections, but improvements in the forcing data sets or a focus on more specific comparisons may prove fruitful in future analyses.Hydroclimate variability can be quantified using a range of variables, including precipitation, soil moisture, lake levels, or other synthetic indices (e.g.Nigam and Ruiz-Barradas, 2006).Most models provide output for these diagnostics, but often these variables are not directly derivable from palaeoclimate archives, creating a challenge when conducting model-data comparisons.However, calibrations of networks of precipitation-sensitive tree ring widths have been used to reconstruct the Palmer Drought Severity Index (PDSI) in North America and Asia over the Common Era (Cook et al., 2004(Cook et al., , 2010)).PDSI is calculated using temperature-derived estimates of the evapo-transpiration and precipitation, and nominally represents a normalised index of soil moisture, with negative values indicating drought and positive values indicate wetter than normal conditions.There are many outstanding issues with using variations of the index globally to assess drought, in definition and availability and quality of inputs and sensitivity (e.g.contrast Sheffield et al., 2012 andDai, 2013).However, we focus here on the question of how well this index, if derived from GCM output, reflects simulated soil moisture and whether this relationship changes over time.
From two GCMs (GISS-E2-R and MIROC-ESM), we calculated PDSI using simulated temperature and precipitation from the GISS-E2-R and MIROC-ESM models using the Thornthwaite method.We compared this index against the standardised (zero mean, unit standard deviation, over the 1850-1950 period, 10 yr smoothing) total column soil moisture model output for the Central Plains of North America (105-90 • W; 32-48 • W) (Fig. 13).Prior to the start of the industrial period in 1850, PDSI and soil moisture track each other closely in both models (GISS-E2-R: r = 0.82; MIROC-ESM: r = 0.50).Beginning near the middle of the twentieth century, however, the two indices diverge dramatically.In one model (GISS-E2-R) the correlation weakens considerably (r = 0.33), while in the other model (MIROC-ESM) the sign of the correlation reverses (r = −0.29).The PDSI changes over the 21st century would suggest severe and unprecedented drought.In contrast, the simulated soil moisture trends indicate a more modest shift towards drying (GISS-E2-R) or even wetter conditions over the coming decades (MIROC-ESM).The divergence in projections is related to the treatment of evapo-transpiration (ET) in the model versus in the PDSI (Thornthwaite) calculation.In this PDSI calculation, temperature is used as a proxy for the energy available while in the GCMs the soil energy and moisture budgets are calculated directly using explicit physical models.In reality, ET becomes increasingly decoupled from temperature as the temperature increases, a factor reflected in the model soil moisture but not in the PDSI index.For time periods with strong transient changes in temperature (e.g. the late 20th century and into the future), our analysis suggests that the Standardised anomalies for PDSI and soil moisture in two models (GISS-E2-R and MIROC-ESM) using a past1000 simulation, and a historical+rcp85 continuation.For reference, the tree-ring-based reconstruction is plotted (dashed line) (Cook et al., 2010), though this would not be expected to line up exactly with the model simulations.All data smoothed with a 10 yr running mean.
usefulness of PDSI for projecting drought and hydroclimate trends is limited.
While this example of a diagnostic divergence is specific to the PDSI and soil moisture, there are wider implications that might need to be explored in other metrics.It is not unusual for a proxy to have a non-stationary response to a climate variable of interest (see Sect. 5.2 for another example), and it is incumbent on the investigator to ensure that any consequences of this are fully explored.

Conclusions and recommendations
In this paper, we have focused on the opportunities provided by "out-of-sample" palaeo-climate experiments within the CMIP5 framework, and specifically how measures of skill in modelling palaeo-climate change might inform future projections of climate change.
We have given examples that show that some relationships are robust across the multi-model ensemble, over multiple simulations and in the palaeo-data (Sect.3) and examples of skill measures that are well correlated with the simulated magnitude of future change, thus allowing the likely magnitude of future changes to be constrained (Sect.4).We also give examples of cases (Sect.5) where there is a need for caution because of the limitations with models, Our examples illustrate the general requirements for attempts to use the palaeo-climate simulations to quantitatively constrain future projections.Each example makes use of a specific target (or targets) from a palaeo-climate reconstruction of change that is within the scope of the modelled system, defines a metric of skill that quantifies the accuracy of the modelled changes and assesses the connection to a future prediction.The successes and problems discussed above lead naturally to a set of guidelines that could profitably guide future research for both modellers and the palaeo-data community: 1. Palaeo-simulations need to be performed with models that are also being used for future projections and produce model diagnostics that are commensurate in all experiments (as in CMIP5).
2. The more extensive the structural uncertainty examined (across models, boundary conditions etc.) the more robust any resulting constraints will be.Some of our analyses (i.e.Sects.4.1 and 4.2) are limited by the small number of palaeo-simulations currently available in the CMIP5 database, and we hope that the demonstration of their potential to address questions relevant to the future will encourage other modelling groups to complete and archive these simulations.
3. Palaeo-data targets should be spatially representative synthesis products with well-characterised uncertainties.Our analyses rely heavily on the use of synthesis data products, for instance the MARGO data set for the LGM (MARGO, 2009), pollen-based reconstructions for the mid-Holocene (Bartlein et al., 2011), multi-proxy reconstructions of hemispheric temperature (e.g.Moberg et al., 2005), or gridded tree-ringbased reconstructions of PDSI for the last millennium (Cook et al., 2010).Such products are invaluable, but there is a need for increased transparency of included uncertainties and continued expansion of targets (e.g.see Müller et al., 2011 for sea ice extent).
Increasing model complexity and scope, for instance by including a carbon cycle, fire models or online tracers such as water isotopes, necessitates the creation of new synthesis products (e.g.charcoal records: Daniau et al., 2012; or sea surface carbonate isotopes: Oppo et al., 2007) if useful comparisons are to be made.Examples in Sects.4.3 and 5.1 illustrate the need for more efforts in this direction.
4. Skill metrics may be impacted by uncertainties in external forcing (and thus not solely characterise the realism of modelled processes; as in the spectra generated in Sect.5.3), or may have non-stationary relationships with impacts of interest (Sect.5.4).Improved forward modelling of palaeo-data (as in Sect.5.2) will be increasingly important.
5. Relationships between targets in the past and the future predictions should be examined and not assumed.Not all mismatches of palaeo-models and reconstructions are related to factors important for future sensitivities, and not all divergences in future projections are correlated to differences in palaeo-climate skill.
The periods and hypotheses tested using palaeo-climate simulations are far more limited than the number of interesting features in the palaeo-climate record.The three periods selected for CMIP5 were chosen on the basis of their relative maturity (the existence of prior sets of experiments, already tested issues, existing data syntheses), but additional periods are also potentially useful -the mid-Pliocene (2.5 million years ago), the transient 8.2 ka event, the last interglacial, the peak Eocene, etc. (see Schmidt, 2010 for justifications).Some of these periods are already being examined in a coordinated fashion (e.g.Haywood et al., 2013 andDolan et al., 2012 for the Pliocene), and we hope that more coordinated experiments will be started.Further expansions of the model experiments will lead to increases in the production of higher frequency diagnostics (daily and sub-daily variations), and include perturbed physics ensembles to better characterise the model structural uncertainty.On the data side, much greater efforts to create palaeo-data synthesis products with robust uncertainty estimates are possible.All of these expansions will create possibilities for more and better tests of model performance and hence potentially lead to better constraints on future projections.In the meantime, there is still a huge untapped scope for more informative palaeo-model comparisons that can be made using the existing databases.
For simulations of the last millennium though, the forcings are much www.clim-past.net/10

Fig. 2 .
Fig. 2. Left panels: average surface air temperature change, compared to piControl, over land compared to over the oceans for the tropics (23 • S-23 • N) and the North Atlantic and Europe region (45 • W-90 • E, 35-45 • N).LGM -piControl in blue, 1pctCO2 -piControl in orange, abrupt4xCO2 -piControl in red.For the latter 2 periods, the averages have been computed over the same years as Fig.1.The results have been computed for all models in the database on 23 July 2012 for which there were results for the lgm, piControl, 1pctCO2 and abrupt4xCO2.The grey lines indicate the 1 : 1.5 ratio in both plots.The results from the reconstructions are based on theMARGO (2009)   data for the oceans and on theBartlein et al. (2011) data for the continents.The error bars show twice the standard deviation of the distribution of the mean temperature changes over the region, as estimated by the same bootstrap method as described for the bottom-left plot of Fig.1and in Sect.3.1.Right panels: plots show the land-sea ratio computed from the data, again with the same bootstrap method.The horizontal bar shows the median ratio, the thick vertical line shows the 25-75th percentiles and the thin lines the 10-90th percentiles.The model results are computed by a bootstrap method applied on all model results, as explained in the text.The definition of the thin/thick vertical lines and horizontal bar are the same as for the data.
Fig. 3. (a)Relationship between the precipitation dipole change from pre-industrial to future climate under RCP 8.5 for the 2080-2100 and the precipitation dipole change from pre-industrial to mid-Holocene.The precipitation dipole is defined as the difference of precipitation change in RCP 8.5 between the "Guyana" region and the "Nordeste" region.Only those models within each group that had both rcp85 and midHolocene data available at the time of the analysis are plotted.Other models that provided only rcp85 data are listed for completeness, but without any markers.(b) Maps of precipitation changes from piControl to rcp85 (top panels) and from piControl to midHolocene (bottom panels) in average over all available models in group 1 (left panels) and in group 2 (right panels).Contours show corresponding SST changes.The boxes over land and ocean show the areas used in the dipole definitions.

Fig. 4 .
Fig. 4. (a) Global mean LGM temperature change versus overall climate sensitivity to 2xCO 2 .(b) correlation between local LGM air temperature anomaly and climate sensitivity across the model ensemble.(c) correlation across the model ensemble between control run temperatures and climate sensitivity.

Fig. 5 .
Fig. 5. UsingLGM tropical temperature as a constraint on climate sensitivity.Cyan and blue dots represent PMIP2 and CMIP5 simulations respectively.Linear regression and predictive uncertainty range are plotted as solid and dashed blue lines respectively.Small red dots represent a Monte Carlo sample from the estimated proxyderived reconstruction, mapped onto the climate sensitivity.

Fig. 6 .
Fig. 6.Climate sensitivity estimated through weighting of the PMIP models.Blue and cyan dots represent PMIP2 and CMIP5 simulations respectively.Green curve shows prior distribution of climate sensitivity (based on equal weighting of the models).Thick red curve shows posterior distribution, after weighting according to match to the LGM tropical temperature.Thin red curves show the individual models' contributions to the posterior after weighting.Vertical bars indicate 5, 50 and 95 percentiles.

Fig. 7 .
Fig. 7. Sea-ice extent in CMIP5 models in 10 6 km 2 .(a) 30-yr mean seasonal cycle for the period 1870-1900, (b) the anomaly in sea ice extent for the period 2036-2065 in RCP 8.5, and (c) the anomaly at the mid-Holocene.

Fig. 9 .Fig. 8 .
Fig. 9. Relationship between the September MH anomaly and the September RCP 8.5 anomaly across the CMIP5 models.

Fig. 10 .
Fig.10.Upper panel: magnitude of the trend in δ 18 O c (‰/decade, computed from a simple linear regression through the trend PC) in corals (far left), Simple Ocean Data Assimilation (SODA) 20th-century reanalysis(Carton and Giese, 2008;Giese and Ray, 2011;Compo et al., 2011), a 500 yr control run from GFDL CM2.1(Wittenberg, 2009), and the CMIP3 and CMIP5 multimodel ensembles.In each case, δ 18 O c was modelled from SST and SSS (1), SST only (2), and SSS (3).Lower Panel: magnitude of the δ 18 O c trend (‰/decade, computed from a simple linear regression through the trend PC) over 1890-1990 in pseudocorals modelled from CMIP5 historical simulations and over 2006-2100 in the RCP 4.5 projections where numbers in parenthesis indicate the number of runs in the historical and RCP 4.5 ensemble, respectively.

Fig. 11 .
Fig.11.Spectra from an ensemble of LM simulations using the same model but driven with different sets of forcings compared withLjundqvist (2010),Mann et al. (2008) andMoberg et al. (2005) reconstructions.The clustering of simulations is driven entirely by changes in the volcanic forcing data set used, with the simulations with the most decadal and multi-decadal variability using theGao et al. (2008) reconstruction.Only in the examples where no volcanic forcing is used at all is the impact of different solar forcing reconstructions detectable.Spectra derived using MEM with 30 poles, from 850 to 2005, after correction for control run drift using a loess low-frequency estimate derived from the control run.Key abbreviations: Land use: Pnz(Pongratz et al., 2008), Kap(Kaplan et al., 2011); Solar: Vra(Vieira et al., 2011), Stn(Steinhilber et al., 2009); Volcanic: 2xGao (twice the forcing fromGao et al., 2008), Crw(Crowley and Unterman, 2013).
reconstruction(Gao), compared to the Crowley and Unterman (2013) reconstruction (Crw).Note that the implementation of theGao et al. volcanic  forcing in these

Fig. 12 .
Fig. 12. rms fluctuations of instrumental and palaeo-climate reconstructions compared to drift-corrected simulations of the Northern Hemisphere land temperature for the period 1500-1900.2xGao and Crw refer to GISS-E2-R simulations using the (2x) Gao et al. (2008) and Crowley and Unterman (2013) reconstructions of volcanic forcing.The multiproxy reconstruction used is an average of three NH estimates, and the rms fluctuations are separately shown for the periods 1000-1900 and 1000-1980.
Fig.13.Standardised anomalies for PDSI and soil moisture in two models (GISS-E2-R and MIROC-ESM) using a past1000 simulation, and a historical+rcp85 continuation.For reference, the tree-ring-based reconstruction is plotted (dashed line)(Cook et al., 2010), though this would not be expected to line up exactly with the model simulations.All data smoothed with a 10 yr running mean.
used in CMIP5, or with the palaeoclimate data itself.

). Clim. Past, 10, 221-250, 2014 www.clim-past.net/10/221/2014/Table 1 .
List of models, institutions and experiments used in the analyses in this paper.Experiment names use the CMIP5 database shorthand, and run numbers are the "rip" coding for each experiment.