The PMIP4-CMIP6 Last Glacial Maximum experiments: preliminary results and comparison with the PMIP3-CMIP5 simulations

Climate of the


Introduction
The climate of the Last Glacial Maximum (LGM, ~21,000 years ago) has been a focus of the Paleoclimate Modelling 25 Intercomparison Project (PMIP) since its inception. It is the most recent global cold extreme, and as such has been widely documented and used for benchmarking state-of-the-art climate models (Braconnot et al., 2012;Harrison et al., 2014Harrison et al., , 2015. The increase in global temperature rise from the LGM until now (~4° to 6°C, Annan and Hargreaves, 2015;Friedrich et al., 2016) is of the same order of magnitude as the increase projected by 2100 CE under moderate to high-end emission scenarios. The LGM world was very different from the present one, with large ice sheets covering northern North America and 30 Fennoscandia, in addition to the Greenland and Antarctic ice sheets still present today. These additional ice sheets resulted in a lowering of the global sea level by ~120 m, which induced changes in the land-sea distribution. The closure of the Bering Strait and the exposure of the Sunda and Sahul shelves between southeast Asia and the maritime continent are the most prominent of these changes in land-sea geography. Atmospheric greenhouse gases (GHGs) were lower than pre-industrial (PI) values, leading to cooling in addition to that induced by the large ice sheets. The cooling is more pronounced in the high 35 latitudes than in the tropics, and greater over land than ocean. The polar amplification and the land-sea contrast signals simulated by the previous generation of palaeoclimate simulations (PMIP3-CMIP5) are similar in magnitude (although opposite in sign) to the signals seen in future projections and have been shown to be consistent with climate observations for the historic period and reconstructions for the LGM (Braconnot et al., 2012;Izumi et al., 2013;Harrison et al., 2014Harrison et al., , 2015. However, while the models are able to represent the thermodynamic behaviour that gives rise to these large-scale temperature 40 gradients, they underestimate cooling on land, especially winter cooling, and overestimate tropical cooling over the oceans (Harrison et al., 2014). Thus, one question to be addressed with the new PMIP4-CMIP6 simulations is whether there is any improvement in capturing regional temperature changes. The large temperature changes at the LGM make this interval a natural focus for efforts to constrain climate sensitivity, but attempts to do this using the PMIP3-CMIP5 simulations were inconclusive (Schmidt et al., 2014;Harrison et al, 2014), in part because of the limited number of LGM simulations available, 45 and in part because of the limited range of climate sensitivity sampled by these models. Changes in model configuration have resulted in several of the PMIP4-CMIP6 models having substantially higher climate sensitivity than the PMIP3-CMIP5 versions of the same models, and thus the range of climate sensitivity sampled by the PMIP4-CMIP6 models is much wider.
This provides an opportunity to re-examine whether the LGM could provide a strong constraint on climate sensitivity.

50
The atmospheric general circulation was strongly modified from its modern day conditions by changes in coastlines at low latitudes (DiNezio and Tierney, 2013) and by the presence of the Laurentide and Fennoscandian ice sheets (e.g. Laîné et al., 2009;Lofverstrom et al., 2014Lofverstrom et al., , 2016Ullman et al., 2014;Beghin et al., 2015). These changes in circulation had an impact on precipitation, which was reduced globally (Bartlein et al., 2011) but increased locally, for example in southwestern North 2017; Lora, 2018;Lofverstrom and Lora, 2017;Lofverstrom, in review). The interplay between temperature-driven and circulation-driven changes in regional precipitation at the LGM represents a test of the ability of state-of-the-art models to simulate precipitation changes under future scenarios, where both thermodynamic (e.g. related to the Clausius-Clapeyron relationship) and dynamic (e.g. related to changes in the position of the storm-tracks and extent of the subtropical anticyclones) effects contribute to changes in the amount and location of precipitation (e.g., Boos, 2012;Scheff and Freirson, 2012;Lora, 60 2018). Evaluation of the PMIP3-CMIP5 simulations showed that models underestimate the LGM reduction in mean annual precipitation over land (Harrison et al., 2014), reflecting the underestimation of temperature changes in the simulations , and this resulted in an underestimation of the observed aridity (precipitation minus evapotranspiration). While the models reproduced circulation-induced changes in precipitation in western North America, they showed no increase in precipitation south of the North American ice sheet and only limited impact on the precipitation of the circum-Mediterranean 65 region (Harrison et al., 2014;Lora, 2018;Morrill et al., 2018). Thus, one question to be addressed with the new PMIP4-CMIP6 simulations is whether there is any improvement in capturing regional precipitation changes. One complication here is that most of the reconstructions used to evaluate the PMIP3-CMIP5 simulations were pollen-based and relied on statistical approaches that do not account for the direct impact of low CO2 on water-use efficiency (Prentice and Harrison, 2009;Gerhardt and Ward, 2010;Bragg et al., 2013;Scheff et al., 2017) and could therefore be dry biased. However, new methods have been 70 developed that account for this effect (Prentice et al., 2017) and thus it will be possible to determine whether accounting for the effect of low CO2 resolves model-data mismatches in regional precipitation at the LGM.
The LGM boundary conditions also had a strong impact on ocean circulation, as documented via multiple tracers (e.g. Lynch-Stieglitz et al., 2007, Böhm et al., 2015 which suggest a shallower North Atlantic Deep Water cell and expanded Antarctic 75 Bottom Water (AABW). This is at odds with the PMIP3-CMIP5 model results (Muglia and Schmittner, 2015) which all show an increase in the Atlantic Meridional Overturning Circulation (AMOC). Previous studies show that this increase in AMOC is related to changes in northern extratropical wind stress due to the presence of the high ice sheets (Oka et al. 2012, Muglia and Schmittner 2015, Klockmann et al. 2016, Sherriff-Tadano et al. 2018, Galbraith and de Lavergne 2019. Thus, the simulation of the AMOC, and ocean circulation in general, at the LGM could be highly sensitive to the ice sheet reconstructions used as 80 boundary conditions (see e.g. Ullman et al., 2014;Beghin et al., 2016). There is still some uncertainty about the height and shape (although not the extent) of the LGM ice sheets, so the protocol for the LGM PMIP4-CMIP6 experiment takes this uncertainty into account by allowing for alternate ice-sheet configurations (Kageyama et al., 2017) in order to test the sensitivity of LGM climate and ocean circulation to ice sheet configuration. The PMIP4-CMIP6 LGM experimental protocol also includes changes in other forcings, including vegetation changes and changes in atmospheric dust loadings and their 85 uncertainties. Thus, the new PMIP4-CMIP6 simulations provide opportunities to examine the response of the climate system to multiple forcings, to calculate the impact of individual forcings through sensitivity experiments, and to examine how these forcings combine to produce circulation and climate changes in the marine and terrestrial realms. In this paper, we present preliminary results from the PMIP4-CMIP6 LGM simulations, compare them to the PMIP3-CMIP5 results (Section 3), and evaluate their realism against a range of climatic reconstructions (Section 4). We focus on temperature and precipitation, extratropical circulation, energy transport and the AMOC.

PMIP3-CMIP5 and PMIP4-CMIP6 protocols for the LGM simulations
The protocol of the LGM experiments changed between the PMIP3-CMIP5 and PMIP4-CMIP6 simulations (Kageyama et al., 2017), partly to accommodate new information about boundary conditions and partly to capitalise on new features of the climate models. The main difference between the PMIP3-CMIP5 and PMIP4-CMIP6 simulations is the specification of the ice sheets. The PMIP3-CMIP5 simulations all used the same ice sheet, which was created as a composite of three separate icesheet reconstructions ; the PMIP4-CMIP6 protocol allows modelling groups to use one of three separate ice-sheet reconstructions: the original PMIP3-CMIP5 ice sheet to facilitate comparison with the earlier simulations, ICE-6G_C (Argus et al., 2014 and GLAC-1D (L. Tarasov, pers. comm.; Ivanovic et al., 2016). All three reconstructions have similar ice-sheet extent, but the height of the Laurentide, Fennoscandian, and West Antarctica ice sheets differ significantly, by several hundred metres in some places. Comparisons of the simulations made with alternative ice-sheet reconstructions will ultimately allow an assessment of the impact of forcing uncertainties on simulated climates.

PMIP3-CMIP5 and PMIP4-CMIP6 models
The LGM model output analysed here are from the PMIP4-CMIP6 and PMIP3-CMIP5 lgm experiments. We use the corresponding piControl experiments as a reference, which are termed "PI" throughout the manuscript. Seven PMIP4-CMIP6 LGM simulations are currently available, and there were a comparable number of models that ran LGM simulations in PMIP3-CMIP5 (Table 1). The PMIP3-CMIP5 ensemble includes one model that ran an additional sensitivity test to ice-sheet height (GISS-E2R: Ullman et al., 2014) and one model that ran simulations with and without dynamic vegetation (MPI-ESM-P: Adloff et al., 2018). The PMIP4-CMIP6 ensemble includes two simulations made with updated versions of the models that contributed to PMIP3-CMIP5, specifically MIROC and MPI-ESM (Table 1). Most of the models that have run the PMIP4-CMIP6 LGM simulations are general circulation models (GCMs) but iLOVECLIM is an Earth System Model of Intermediate Complexity. As such, iLOVECLIM is considerably faster than the GCMs and is the only model so far that has run simulations using different ice sheet reconstructions (ICE-6G_C and GLAC-1D). The LGM simulations were either initialised from a previous LGM simulation or were spun-up from the pre-industrial state. The length of the spin-up therefore varies (Table 1), as indeed does the length of the equilibrium LGM simulation in these preliminary analyses. The INM-CM4-8 results are from the beginning of an lgm simulation and the model is not yet fully equilibrated. All other models have run for several millennia. Our preliminary analyses are based on variables available by December 20th, 2019. Many modelling groups are in the process of uploading data to the CMIP6 archive on the Earth System Grid Federation, but the minimum information we requested for 120 this study were monthly surface air temperatures and precipitation. Although several of the PMIP4-CMIP6 have higher climate sensitivity than the equivalent models in PMIP3-CMIP5, this is not reflected in the ensemble analysed here. In fact, the PMIP4-CMIP6 ensemble, as of December 2019, has lower climate sensitivities than the PMIP3-CMIP5 models (Table 1): equilibrium sensitivities to a CO2 doubling from pre-industrial values range from 2.1 to 3.6°C, (mean: 3.0°C) in the current PMIP4-CMIP6 ensemble, while the range is from 2.1 to 4.7°C (mean: 3.43°C) in the PMIP3-CMIP5 ensemble. 125

Sources of information on LGM climate
The PMIP3-CMIP5 model simulations were evaluated against two benchmark data sets: pollen-based reconstructions of seasonal temperature (mean annual temperature, mean temperature of the coldest month, mean temperature of the warmest month, and growing season temperature indexed by growing degree days above a baseline of 0°C), mean annual precipitation and an index of soil moisture (Bartlein et al., 2011); and a compilation of sea-surface temperature (SST) reconstructions 130 (MARGO, 2009).
In the Bartlein et al. (2011) data set, reconstructions at individual pollen sites were averaged to produce an estimate for a 2 x 2˚ grid; reconstruction uncertainties are estimated as a pooled estimate of the standard errors of the original reconstructions for all sites in each grid cell. Although the Bartlein et al. (2011) has good coverage for some regions, coverage was sparse in the 135 tropics, and there were no reconstructions of LGM climate for Australia. Furthermore, not all of the six climate variables were reconstructed at every site, so statistical comparisons were more robust for some variables that others. The majority of the reconstructions included in the Bartlein et al. (2011) data set used various sorts of statistical calibrations based on modern day conditions and therefore do not account for the impact that changes in CO2 have on water-use efficiency and hence plant distribution. Although Bartlein et al. (2011) were unable to demonstrate a statistically significant difference between statistical 140 reconstructions and model-based inversions (which in principle account for the CO2 effect on plant distribution), their analysis focused on the mid-Holocene where the CO2 effect is small and there is therefore some concern that the data set may LGM. In addition to accounting for potential effects of low CO2 on moisture variables at the LGM, this reconstruction produces coherent estimates of seasonal climate variables at many more points than the original pollen-based reconstructions and also extends the geographic coverage. that: 1) these include no-analogue assemblages (Mix et al., 1999); 2) imply warmer-than-present subtropical gyres, an inference 155 that has been questioned (Crowley, 2000;Telford et al., 2013); and 3) lack Bayesian proxy-system models that were required for the data assimilation technique used by Tierney et al. (2019). Tierney et al. (2019) use the data along with a model prior from the isotope-enabled Community Earth System Model 1.2 (Brady et al., 2019) to produce a full-field data assimilation product; however, since this product relies on the covariance structure of CESM, we only use the data synthesis for comparisons here. Data from the LGM and late Holocene respectively were calibrated using Bayesian models that fully 160 propagate uncertainties (Tierney & Tingley, 2015;Tierney et al., 2018;Malevich et al., 2019;Tierney et al., 2019), yielding a 1,000 member posterior distribution of SSTs. These data were sorted from low to high along the ensemble dimension, and then random error representative of site-level, downcore uncertainty (N(0,0.5˚C) was added back to the matrix. This procedure effectively partitions the error variance; i.e., it assumes that at any given site, absolute uncertainty in SST cancels out in the anomaly calculation, while "relative" uncertainty associated with downcore measurement and non-linearities in the calibration 165 model is preserved. The data were then averaged within a 5˚ x 5˚ grid and differenced. The standard deviation associated with each gridpoint is calculated from the differenced ensemble dimension. A crucial difference between the Tierney et al. (2019) synthesis and MARGO (2009) is that the former implies more extensive tropical cooling during the LGM (-2.5˚C, vs -1.5˚C for MARGO). This can be attributed to the exclusion of the microfossil data as well as recalibration of the U K' 37 proxy with the BAYSPLINE model (Tierney & Tingley, 2018), which corrects for an observed reduced sensitivity of U K' 37 to SST above 170 ca. 24˚ C.

Data-model comparisons
We compare the model simulations to palaeoclimate data, focusing on large-scale features and regional changes. In these comparisons, the reconstructions are expressed as mean values and the uncertainty by the standard error of the reconstructions. 175 Model outputs were extracted only for the grid cells where there are observations. Model uncertainty is represented by the standard deviation of interannual variability. Thus, model uncertainty is not, strictly speaking, equivalent to reconstruction uncertainty but merely provides some measure of the variability engendered by sampling the simulated climate.

Temperature
The global and annual mean temperature in the PMIP4-CMIP6 LGM simulations is between 3.5 and 4.1°C cooler than the PI simulations. The largest changes in temperature between the LGM and PI simulations (Fig. 1) is found over the Laurentide and Fennoscandian ice sheets, reflecting the significant changes in surface height and albedo caused by the ice sheets. Colder conditions are registered in the northern mid and high latitudes, partly reflecting the advection of the cold temperature 185 Although the broadscale patterns of temperature changes are similar, there are important differences between the PMIP4-CMIP6 and PMIP3-CMIP5 ensembles. The PMIP4-CMIP6 simulations are generally warmer than the PMIP3-CMIP5 simulations ( Fig. 1 bottom), except in regions close to West Antarctica, over some restricted areas of the Laurentide ice sheet and in the North Atlantic. The largest difference between the PMIP3-CMIP5 and PMIP4-CMIP6 averages is over the northern North Atlantic and Nordic Seas, probably reflecting differences in sea-ice cover in these areas. Zonally averaged temperatures 195 ( Fig. 2) confirm the differences between the two sets of simulations. The northern extratropics are colder in the PMIP3-CMIP5 simulations (from -5 to nearly -13°C) than the PMIP4-CMIP6 simulations (from -5 to ca -9°C). Similarly, the PMIP3-CMIP5 simulations are slightly colder in the tropics (by -2.4 to -3.3 °C, with an outlier at -1.7°C) than the PMIP4-CMIP6 simulations (-1.7 to -2.6°C). However, the cooling of the southern extratropics is more variable in the PMIP4-CMIP6 simulations (-1.2 to ca -7°C) than in the PMIP3-CMIP5 simulations (-2.5 to ca -6°C). Most of the difference in the global average cooling, which 200 ranges from -3.5 to -4.1°C in the PMIP4-CMIP6 simulations and between -4.5 and -5.5°C in the PMIP3-CMIP5 simulations, stems from differences in the simulated temperatures of the northern hemisphere and probably reflect differences in the ice sheet reconstructions (e.g. Liakka and Lofverstrom, 2018) used in the two sets of simulations. However, it is also possible that differences in model configuration and in climate sensitivity could contribute to the difference between the two ensembles.
The PMIP4-CMIP6 ensemble is also small -only five models are included in the ensemble shown in Fig. 1 -and confirmation 205 of whether differences in the temperature response of the two ensembles are real will only be possible when other LGM simulations become available.

Atmospheric and oceanic circulation
The PMIP4-CMIP6 models simulate large changes in the Northern Hemisphere upper tropospheric atmospheric circulation ( Fig. 3), in response to LGM boundary conditions, in particular over North America and the North Atlantic. The North Atlantic 210 jet stream is narrower and stronger compared to the PI, as shown by an increase of more than 10 m/s in the 250 hPa zonal wind south of the Laurentide ice sheet and extending into the North Atlantic, and a decrease in zonal wind to the northwest and southeast of these regions. The strengthening and narrowing of the North Atlantic jet stream was also a characteristic of the PMIP3-CMIP5 simulations (Beghin et al., 2016). However, in the PMIP4-CMIP6 simulations the jet stream is further north than in the PMIP3-CMIP5 simulations (Fig. 3, bottom), most prominently near the Laurentide ice sheet. This could be because 215 the Laurentide is lower in the ICE-6G reconstruction than the ice sheet used in the PMIP3-CMIP5 simulations (see e.g. Ullman et al., 2014;Beghin et al., 2015;Lofverstom et al., 2016) but may also reflect changes in the representation of the zonal winds between the two sets of simulations. This is supported by the fact that there are differences between the PMIP3-CMIP5 and PMIP4-CMIP6 simulations away from the Laurentide ice sheet, in particular over the North Pacific and the Southern Ocean, where the jet stream is also located more poleward in the PMIP4-CMIP6 than the PMIP3-CMIP simulations. Sensitivity 220 experiments using the PMIP3-CMIP5 ice sheets with PMIP4-CMIP6 models, as planned in the PMIP4 LGM experiment protocol (Kageyama et al., 2017), should help resolve the question of whether differences in model treatment or boundary conditions are responsible for the differences in atmospheric circulation between the two ensembles.
The extent of the North Atlantic Deep Water (NADW) cell simulated by PMIP4-CMIP6 models is very similar for LGM and 225 PI, except for iLOVECLIM, which shows a deepening of the NADW cell for LGM (Fig. 4,  and UoT-CCSM4 models). MPI-ESM1.2 simulates an increase in northward ocean heat transport at all latitudes for the LGM compared to PI, while MIROC-ES2L simulates an increase in this transport from 15°S to 60°N. UoT-CCSM4 is the only model simulating a decrease in northward heat transport over a significant range of latitudes, from 50°S to 70°N. All PMIP4-CMIP6 models simulate an increase in northward atmospheric heat transport, in the tropics and up to 50°N. MIROC-ES2L simulates an increase up to 70°N (Fig. 5, middle). In summary, all models simulate an increase in northward heat transport 245 (Fig. 5, top) in the tropics and northern mid-latitudes, although in the UoT-CCSM4 model the increase is confined to south of 50°N. This increase in northward heat transport in the tropics and northern mid-latitudes during the LGM as compared to PI was also simulated by most PMIP3-CMIP5 models. Given that the magnitude of the heat transport increase is similar in the PMIP4-CMIP6 and PMIP3-CMIP6 simulations, the warmer temperatures at high northern latitudes in the PMIP4-CMIP6 simulations cannot be due to differences in northward ocean heat transport. 250

Hydrological cycle
The large-scale gradients in precipitation are similar in the PMIP4-CMIP6 LGM and PI simulations (Fig. 6, top), with maximum precipitation in the tropics (Intertropical Convergence Zone and monsoon regions) and secondary maxima in the mid-latitudes, corresponding to the position of the North Pacific, North Atlantic, and Southern Ocean storm-tracks. The 255 PMIP4-CMIP6 models show a decrease in precipitation between the LGM and PI in all these high precipitation areas (Fig. 6, middle and Fig. 7, top left). There are some regions where precipitation increases at the LGM compared to the PI: all PMIP4-CMIP6 models simulate more precipitation over the Pacific Ocean and to the south of the Laurentide ice sheet, and over southern Africa; some models simulate wetter conditions over the Iberian Peninsula, and some simulate an increase in precipitation over the northern and southern subtropical zones in the Pacific and over the Atlantic southern subtropical zone. 260 However, the areas with decreased precipitation are much more extensive than areas with increased precipitation, so zonal averages for the southern extratropics, tropics and northern extratropics (Fig. 7) all show a decrease in precipitation.
The broadscale patterns of change in precipitation in the PMIP4-CMIP6 simulations are similar to those found in the PMIP3-CMIP5 simulations. The PMIP4-CMIP6 ensemble is slightly less dry than the PMIP3-CMIP5 ensemble (Fig. 6), particularly 265 in the northern extratropics, and this may reflect the fact that the PMIP4-CMIP6 simulations are less cold than the PMIP3-CMIP5 simulations. However, the geographic patterning in the precipitation changes between the PMIP4-CMIP6 and PMIP3-CMIP5 ensembles (Fig. 6, bottom) is complex, and some of the differences between the two ensembles are probably related to atmospheric circulation changes, particularly in the tropics (Fig. 3). Both ensembles show a consistent decrease in zonally averaged precipitation in the southern extratropics, tropics, and northern extratropics (Fig. 7). When obvious outliers are 270 excluded (e.g. CNRM-CM5 for PMIP3, iLOVECLIM for PMIP4), the simulated range of precipitation changes is comparable for the PMIP3-CMIP5 and PMIP4-CMIP6 ensembles.
Evapotranspiration patterns in the PMIP4-CMIP6 LGM and PI simulations are characterized by maximum values in the subtropics and decreases toward high latitudes. The models simulate a global decrease in LGM evapotranspiration relative to 275 the PI that strongly peaks over and around the northern hemisphere ice sheets (Fig. 8, left). These results are in agreement with the broad patterns of the PMIP3-CMIP5 ensemble, albeit with a muted average decrease consistent with the fact that the LGM PMIP4-CMIP6 simulations are less cold. As a result, net precipitation (precipitation minus evapotranspiration) in the PMIP4-CMIP6 ensemble is higher at the LGM than the PI in the extratropics-particularly over the mid-latitude eastern Pacific in both hemispheres and over most of North America-with the exception of the North Atlantic where evaporation decreases are 280 more localized and do not compensate for the reductions in precipitation (Fig. 8, right). This, together with colder temperatures, could help explain why the PMIP4-CMIP6 models simulate a stronger AMOC at the LGM. Substantial reductions in continental net precipitation only occur over tropical South America and high-latitude regions, while Africa, Australia, and the mid-latitude regions of Eurasia and the Americas see little change or even increased net precipitation.

Data-model comparisons
The evaluation of the PMIP3-CMIP5 LGM simulations showed that large scale climate features, such as the ratio of changes in land-sea temperature, in high-latitude temperature amplification, and in precipitation scaling with temperature were broadly consistent with modern observations (Braconnot et al., 2012;Izumi et al., 2013;Harrison et al., 2014Harrison et al., , 2015.

290
The ratio for the land-sea difference in changes in mean annual temperature in the tropics in the PMIP4-CMIP6 simulations is compatible with the ratio reconstructed from the Bartlein et al. (2011) andMARGO (2009) data sets. However, the simulated ocean temperatures in the PMIP3-CMIP5 simulations are somewhat too cold compared to the MARGO (2009) reconstructions.
Although the Cleator et al. (2019) data set has a larger spatial coverage than the Bartlein et al. (2011) data set, there is no significant difference between the two data sets for most of the temperature variables across common grid cells (Fig 9). 295 However, the new reconstructions have a reduced range at the warm end so that some simulations are recorded as warmer than the land-based reconstructions (Fig. 10) The amplification of temperature changes at high northern latitudes compared to the tropics is apparent over both the land and the ocean domains, although the amplification appears to be smaller in the new data syntheses (Fig. 11). For the ocean domain, 310 this could reflect the influence of seasonal production on the extratropical sites, with indicators being more sensitive to summer changes, or to changes in the seasonal production cycle. Comparisons of the amplification over land areas with the Bartlein et al. (2011) data set suggest that the simulated tropical cooling is too large in the PMIP3-CMIP5 simulations whereas the extratropical cooling was both larger and smaller than suggested by the reconstructions in both ensembles. Simulated tropical temperatures are more consistent with the Cleator et al. (2019) reconstructions, suggesting that the apparent over-estimation 315 of tropical cooling in the PMIP3-CMIP5 simulations over land may reflect the paucity of tropical data points in Bartlein et al (2011). However, the discrepancies between the simulated and reconstructed extratropical land temperatures are still present: there are several PMIP3-CMIP5 simulations that are much colder than the reconstructions, whereas the currently available PMIP4-CMIP6 simulations tend to be warmer than the reconstructions. Although polar amplification is more muted over the The LGM climate is characterised by an increase in temperature seasonality in extratropical regions, with larger changes in winter than in summer . In general, this change in seasonality is reproduced by the models. However, the simulated change in winter temperature is smaller than indicated by the reconstructions (Fig. 12, top line). The magnitude of the summer cooling is more consistent in the PMIP4-CMIP6 simulations than in the PMIP3-CMIP5 simulations in North 330 America, Europe and Eurasia. Simulated changes in summer temperature are more consistent with the reconstructions than simulated changes in winter temperature. Although the PMIP3-CMIP5 simulated winter temperatures over North America are broadly consistent with the reconstructions, the PMIP4-CMIP6 simulations are generally warmer than the reconstructed temperatures. Simulated winter temperatures in Europe are generally outside the range of the reconstructions for both ensembles. This is consistent with the fact that the simulated SSTs in the North Atlantic are generally warmer than the 335 reconstructions, given that the mean annual temperature signal (over land and ocean) is dominated by changes in the winter temperature. Over extratropical Eurasia, the PMIP3-CMIP5 models simulate an amplification of the seasonal cycle, but overestimate both the winter and summer cooling compared to the Cleator et al. (2019) reconstructions, which are much warmer than the Bartlein et al. (2011) reconstructions. The PMIP4-CMIP6 models are more consistent with the reconstructions, with 4 simulations (AWIESM1 and 2 and both iLOVECLIM1.4) being within the range of uncertainty for both winter and 340 summer temperature. Other PMIP4-CMIP6 models overestimate the cooling in both seasons, but less so than the PMIP3-CMIP5 models. This could reflect the lower climate sensitivities of the currently available PMIP4-CMIP6 compared to the PMIP3-CMIP5 models, or the changes in the ice-sheet altitude and their impact on planetary waves.
Regional changes in the tropics (Fig. 12, bottom line) are more muted than those in the northern extratropics, and seasonality 345 differences are small. We therefore base our comparisons on the mean annual temperature (MAT) and mean annual precipitation (MAP). The PMIP3-CMIP5 models simulate MAT consistent with both data syntheses over tropical America and Africa and with the Bartlein et al. (2011) simulation over Asia, while PMIP4-CMIP6 models underestimate tropical cooling --most noticeably in tropical America. This is probably related to the simulated tropical SSTs being warmer than the Tierney et al. (2019) reconstructions. The reconstructed changes in tropical precipitation are generally small (Fig. 12), except 350 in the case of tropical Africa where there is a large difference between the Bartlein et al. (2011) and the Cleator et al. (2019) reconstructions. Although there is considerable within-model variability, the PMIP3-CMIP5 and PMIP4-CMIP6 simulations are consistent with the precipitation reconstructions for the Americas and Asia, with only a few models simulating conditions too dry compared to the reconstructions. Although both the PMIP3-CMIP5 and PMIP4-CMIP6 ensembles produce a cooling in Africa consistent with the reconstructions, simulated changes in precipitation are generally smaller than indicated by 355 observations. There is, however, a large difference between the estimates of precipitation change given by the Bartlein et al. (2011) and data sets for tropical Africa, and the simulated changes in precipitation are more consistent with the newer data set. The PMIP4-CMIP6 models are more consistent with the temperature reconstructions over tropical Asia, but show poorer agreement with the precipitation reconstructions than the PMIP3-CMIP5 models. There is no systematic improvement in the simulation of tropical climates between the PMIP4-CMIP6 and PMIP3-CMIP5 ensembles. The biases are 360 different, with some regions/variables better represented by the PMIP3-CMIP5 models (e.g. tropical Americas and Africa for the temperatures, tropical Asia for precipitation), while others are better represented by the PMIP4-CMIP6 models (tropical Americas for precipitation, tropical Asia for temperatures).
The four data syntheses can be used together to constrain the global MAT change from LGM to PI. There is a good correlation 365 between the change in global average MAT over the reconstruction grid points and computed taking all the model grid points into account (Fig. 13). There are models with results below, within and above the average of all of the reconstructions except MARGO (2009) where there is no model with MAT above the reconstructed value. Retaining only the models which produce changes in MAT consistent with the reconstructions (and reconstruction uncertainty), the globally averaged change in MAT is between -5.3 and -3.7°C using the Bartlein et al. (2011) and the Cleator et al. (2019) data sets, between -4.9 and -3.2°C for 370 the Tierney et al. (2019) data set and above -3.9°C for the MARGO (2009) data set. These estimates are slightly warmer than previous estimates, which indicate changes in MAT of between 4 and 6°C Friedrich et al., 2016).

Conclusions and perspectives 375
The results from the PMIP4-CMIP6 models differ from those of the PMIP3-CMIP5 in several ways. The amplitude of the global and large-scale cooling is smaller in the PMIP4-CMIP6 ensemble, which is probably a result of the weaker climate sensitivity of the current ensemble and the lower altitude of the ice sheets prescribed as boundary conditions.. This change in the ice sheets also has an impact on atmospheric circulation over North America and the North Atlantic, but possibly over extratropical Eurasia too. The AMOC increases less in the PMIP4-CMIP6 than in the PMIP3-CMIP5 simulations, and the 380 depth of the NADW cell remains stable, in contrast with half the models of the PMIP3-CMIP5 ensemble, which simulated a large deepening of this cell. This could be due to the changes in atmospheric circulation over the North Atlantic, as well as changes in the North Atlantic freshwater balance. Changes in precipitation are generally similar for the PMIP3-CMIP5 and PMIP4-CMIP6 ensembles and characterised by less precipitation overall. Reduced evaporation due to colder temperatures precipitation minus evaporation) are larger than areas with positive LGM -PI precipitation anomalies. However, both precipitation and net precipitation changes show large spatial heterogeneity, and different regional-scale patterns of change between the PMIP4-CMIP6 and PMIP3-CMIP5 ensembles, which could be related to the difference in atmospheric circulation and temperature changes. Additional sensitivity experiments are needed to separate the effects of changes in model configuration and sensitivity on general circulation features, such as the position of the jet streams, from the effects of 390 differences in boundary conditions, such as the improved realism of the ice sheet configuration.
The PMIP4-CMIP6 ensemble confirms that the models capture large-scale thermodynamic behaviour, such as land-sea contrast and polar amplification, well. Indeed, the match of both the PMIP3-CMIP5 and PMIP4-CMIP6 results to the new reconstructions of Tierney et al. (2019) and Cleator et al. (2019) is better than with the reconstructions used to evaluate these 395 features previously (Braconnot et al., 2012;Izumi et al., 2013;Harrison et al., 2014Harrison et al., , 2015. There is no obvious improvement in model performance at a regional scale between the PMIP3-CMIP5 and PMIP4-CMIP6 ensembles. In some cases (e.g mean annual temperature change over the tropical Americas), the PMIP3-CMIP5 ensemble 405 demonstrate a better ability to capture the changes depicted by the reconstructions, in some other (e.g. winter and summer temperatures over extratropical Asia, summer temperatures over North America, winter temperatures over Europe), the PMIP4-CMIP6 ensemble is clearly better.
Our analyses present a first picture of the PMIP4-CMIP6 LGM experiments. Results from CMIP6 models with high climate 410 sensitivity are not available yet, but will need to be considered in a full assessment of the PMIP4-CMIP6 simulations. Sensitivity experiments, for example to different ice sheet configurations, are needed to disentangle the impact of model improvements from those related to using more realistic boundary conditions. Additional planned simulations will also help to disentangle the impacts of changes in vegetation and aerosol loading on the LGM climate. A more systematic evaluation of the simulated climates, using a wider range of palaeoenvironmental data, will be helpful in understanding why there are 415 persistent mismatches between the simulations and reconstructions at a regional scale. Nevertheless, this preliminary analysis demonstrates the utility of the PMIP4-CMIP6 simulations in addressing questions about the response of climate to large changes in forcing and illustrates the need to investigate the causes of inter-model differences in these responses.   Table 1, except the GISS-E2-p151 simulation which did not use the PMIP3 ice sheet for its boundary conditions.