Evaluation of Arctic warming in mid-Pliocene climate simulations

Palaeoclimate simulations improve our understanding of the climate, inform us about the performance of climate models in a different climate scenario, and help to identify robust features of the climate system. Here, we analyse Arctic warming in an ensemble of 16 simulations of the mid-Pliocene Warm Period (mPWP), derived from the Pliocene Model Intercomparison Project Phase 2 (PlioMIP2). The PlioMIP2 ensemble simulates Arctic (60–90 N) annual mean surface air temperature (SAT) increases of 3.7 to 11.6 C compared to the pre-industrial period, with a multimodel mean (MMM) increase of 7.2 C. The Arctic warming amplification ratio relative to global SAT anomalies in the ensemble ranges from 1.8 to 3.1 (MMM is 2.3). Sea ice extent anomalies range from −3.0 to −10.4× 106 km2, with a MMM anomaly of−5.6×106 km2, which constitutes a decrease of 53 % compared to the pre-industrial period. The majority (11 out of 16) of models simulate summer seaice-free conditions (≤ 1× 106 km2) in their mPWP simulation. The ensemble tends to underestimate SAT in the Arctic when compared to available reconstructions, although the degree of underestimation varies strongly between the simulations. The simulations with the highest Arctic SAT anomalies tend to match the proxy dataset in its current form better. The ensemble shows some agreement with reconstructions of sea ice, particularly with regard to seasonal sea ice. Large uncertainties limit the confidence that can be placed in the findings and the compatibility of the different proxy datasets. We show that while reducing uncertainties in the reconstructions could decrease the SAT data–model discord substantially, further improvements are likely to be found in enhanced boundary conditions or model physics. Lastly, we compare the Arctic warming in the mPWP to projections of future Arctic warming and find that the PlioMIP2 ensemble simulates greater Arctic amplification than CMIP5 future climate simulations and an increase instead of a decrease in Atlantic Meridional Overturning Circulation (AMOC) strength compared to pre-industrial period. The results highlight the importance of slow feedbacks in equilibrium climate simulations, and that caution must be taken when using simulations of the mPWP as an analogue for future climate change.


Introduction
The simulation of past climates improves our understanding of the climate system, and it provides an opportunity for the evaluation of the performance of climate models beyond the range of present and recent climate variability (Braconnot et al., 2012;Harrison et al., 2014Harrison et al., , 2015Masson-Delmotte et al., 2013;Schmidt et al., 2014). Comparisons of palaeoclimate simulations and palaeoenvironmental reconstructions have been carried out for several decades (Braconnot et al., 2007;Joussaume and Taylor, 1995) and show that while climate models can reproduce the direction and large-scale pat-terns of changes in climate, they tend to underestimate the magnitude of specific changes in regional climates (Braconnot et al., 2012;Harrison et al., 2015). The comparison of palaeoclimate simulations with future projections has aided in the identification of robust features of the climate system which can help constrain future projections (Harrison et al., 2015;Schmidt et al., 2014), including in the Arctic (Yoshimori and Suzuki, 2019).
One such robust feature is the Arctic amplification of global temperature anomalies (Serreze and Barry, 2011). Increased warming in the Arctic region compared to the global average is a common feature of both palaeoclimate and future climate simulations and is also present in the observational record (Collins et al., 2013;Masson-Delmotte et al., 2013). Arctic warming has a distinct seasonal character, with the largest sea surface temperature (SST) and the smallest surface air temperature (SAT) anomalies occurring in the summer due to enhanced ocean heat uptake following sea ice melt (Serreze et al., 2009;Zheng et al., 2019). It is critical to correctly simulate Arctic amplification as it is shown that projected Arctic warming affects ice sheet stability, global sea-level rise, and carbon cycle feedbacks (e.g. through permafrost melting; Masson-Delmotte et al., 2013). Several multi-model analyses that included palaeoclimate simulations and/or future projections found that changes in northern high-latitude temperatures scale (roughly) linearly with changes in global temperatures (Bracegirdle and Stephenson, 2013;Harrison et al., 2015;Izumi et al., 2013;Masson-Delmotte et al., 2006;Miller et al., 2010;Schmidt et al., 2014;Winton, 2008).
Underestimation of Arctic SAT has been reported for several climates in the Palaeoclimate Modelling Intercomparison Project Phase 3 (PMIP3), including the mid-Pliocene Warm Period (Dowsett et al., 2012;Haywood et al., 2013a;Salzmann et al., 2013), Last Interglacial (LIG; Lunt et al., 2013;, and Eocene (Lunt et al., 2012a). PMIP4 simulations, however, of the LIG showed good agreement with SAT reconstructions in the Canadian Arctic, Greenland, and Scandinavia, while showing overestimations in other regions . PMIP4 simulations of the Eocene were also able to capture the polar amplification indicated by SAT proxies .
In the present work, we analyse the simulated Arctic warming in a new ensemble of 16 simulations in the Pliocene Model Intercomparison Project Phase 2 (PlioMIP2) . PlioMIP2 is designed to represent a discrete time slice within the mid-Pliocene Warm Period (mPWP; 3.264-3.025 Ma; sometimes referred to as mid-Piacenzian Warm Period): Marine Isotope Stage (MIS) KM5c, Ma Haywood et al., 2013bHaywood et al., , 2016. The mPWP is the most recent period in geological history with atmospheric CO 2 concentrations similar to the present, therefore providing great potential to learn about warm climate states. Additionally, the KM5c time slice is characterized by a similar-to-modern orbital forcing (Haywood et al., 2013b;Prescott et al., 2014). These factors give lessons learned from the mPWP and the KM5c time slice in particular, with potential relevance for future climate change (Burke et al., 2018;Tierney et al., 2019), and this is one of the guiding principles of PlioMIP .
Palaeoenvironmental reconstructions show that the elevated CO 2 concentrations in the mPWP coincided with substantial warming, which was particularly prominent in the Arctic Dowsett et al., 2012;Panitz et al., 2016;Salzmann et al., 2013;Haywood et al., 2020) discuss the large-scale outcomes of PlioMIP2 and observe a global warming that is between the best estimates of predicted end-of-century global temperature change under the RCP6.0 (+2.2 ± 0.5 • C) and RCP8.5 (3.7 ± 0.5 • C; Collins et al., 2013) emission scenarios.
The dominant mechanism for global warming in mid-Pliocene simulations is through changes in radiative forcing following increases in greenhouse gas concentrations (Chandan and Peltier, 2017;Hill et al., 2014;Hunter et al., 2019;Kamae et al., 2016;Lunt et al., 2012b;Stepanek et al., 2020;. Polar warming is also dominated by changes in greenhouse gas emissivity (Hill et al., 2014;Tindall and Haywood, 2020). Apart from the changes in greenhouse gas concentrations, changes in boundary conditions that led to warming in previous simulations of the mPWP included the specified ice sheets, orography, and vegetation (Hill, 2015;Lunt et al., 2012b).
In PlioMIP1, the previous phase of this project, model simulations underestimated the strong Arctic warming that is inferred from proxy records was found (Dowsett et al., 2012;Haywood et al., 2013a;Salzmann et al., 2013). This data-model discord may have been caused by uncertainties in model physics, boundary conditions, or reconstructions (Haywood et al., 2013a).
Uncertainties in model physics include physical processes that are not incorporated in the models and uncertainties in model parameters. It was found that the inclusion of chemistry-climate feedbacks from vegetation and wildfire changes leads to substantial global warming (Unger and Yue, 2014), while excluding industrial pollutants, explicitly simulating aerosol-cloud interactions , and decreasing atmospheric dust loading (Sagoo and Storelvmo, 2017) leads to increased Arctic warming in mPWP simulations. Similarly, in simulations of the Eocene, two models that implemented modified aerosols had better skill than other models at representing polar amplification . Changes in model parameters, such as the sea ice albedo parameter (Howell et al., 2016b), may provide further opportunities for increasing data-model agreement in the Arctic.
Several studies found changes in boundary conditions that could help resolve some of the data-model discord in the Arctic for PlioMIP1 simulations. The studied changes in boundary conditions include changes in orbital forcing Prescott et al., 2014;Salzmann et al., 2013), atmospheric CO2 concentrations Howell et al., 2016b;Salzmann et al., 2013), and palaeogeography and bathymetry (Brierley and Fedorov, 2016;Feng et al., 2017;Hill, 2015;Otto-Bliesner et al., 2017;Robinson et al., 2011).
New in the experimental design of PlioMIP2 is a closed Bering Strait and Canadian Archipelago in the mPWP simulation. The closure of these Arctic Ocean gateways has been shown to alter oceanic heat transport into the North Atlantic (Brierley and Fedorov, 2016;Feng et al., 2017;Otto-Bliesner et al., 2017). Additionally, the focus on a specific time slice within the mPWP allows for reduced uncertainties in reconstructions and boundary conditions, in particular with regards to orbital forcing. These changes have led to an improved data-model agreement for reconstructions of SST, particularly in the North Atlantic Mc-Clymont et al., 2020;. Multi-model mean (MMM) SST anomalies in the North Atlantic deviate less than 3 • C from reconstructed temperatures .
In the following sections, we first evaluate the simulated Arctic (60-90 • N) temperatures and sea ice extents (SIEs) in the PlioMIP2 ensemble. We then perform a data-model comparison for SAT and an evaluation of how uncertainties in the reconstructions may affect the outcomes of the data-model comparison. We then compare the simulated sea ice to reconstructions. Lastly, we investigate two climatic features of the mPWP, namely Arctic amplification and the Atlantic Meridional Overturning Circulation (AMOC), and compare these analyses to findings of future climate studies to investigate the extent to which the mPWP can be used as an analogue for future Arctic climate change.

Participating models
The simulations of the mPWP by 16 models participating in PlioMIP2 were used in this study. The models included in this study are listed in Table 1. A more detailed description of each model's information and experiment setup can be found in Haywood et al. (2020). All model groups incorporated the standardized set of boundary conditions from the PlioMIP2 experimental design in their simulations .
For each simulation, the last 100 years of data are used for the analysis. Individual model results are calculated on the native grid of each model. MMM results are obtained after regridding each model's output to a 2 • × 2 • grid using bilinear interpolation. Using a non-weighted ensemble mean theoretically averages out biases in models, assuming models are independent, and errors are random (Knutti et al., 2010). Climate models can, however, generally not be assumed to be independent (Knutti et al., 2010;Tebaldi and Knutti, 2007),  Li et al. (2020) and this is especially true for the PlioMIP2 ensemble where many models have common origins (Table 1). The MMM results will therefore likely be biased towards specific common errors within the models comprising the ensemble.

Data-model comparisons
To evaluate the ability of climate models to simulate mPWP Arctic warming, we first perform a comparison to SAT estimates from palaeobotanical reconstructions. The datamodel comparison is performed using temperature anomalies, calculated by differencing the mPWP and the preindustrial simulation, to avoid overestimations of agreement due to strong latitudinal effects on temperature (Haywood and Valdes, 2004). Reconstructed mPWP SATs are taken from , who updated and combined an earlier compilation made by Salzmann et al. (2013) (Table S1). Qualitative estimates of confidence levels for each reconstruction were made by Feng et al. (2017) and Salzmann et al. (2013). Only reconstructions that are located at or northward of 60 • N and for which the temporal range covers the KM5c time slice are included in the data-model comparison. Three reconstructions from Ballantyne et al. (2010) at the same location (78.3 • N, −80.2 • E) were averaged to avoid oversampling that location. The uncertainties in the reconstructions were derived by Feng et al. (2017) and Salzmann et al. (2013) The data-model comparison will be a point-to-point comparison of modelled and reconstructed temperatures estimated from palaeobotanical proxies, which initially does not take the uncertainties of the reconstructions (Table S1) into account. The potential influence of the uncertainties in reconstructions on the outcomes of the data-model comparison will be investigated in a later section. The temporal range of the reconstructions is broad and certainly not resolved to the resolution of the KM5c time slice, unlike the dataset of SST estimates compiled by Foley and Dowsett (2019) used for PlioMIP2 SST data-model comparisons by  and McClymont et al. (2020). Prescott et al. (2014) found that peak warmth in the mPWP would be diachronous between different regions based on simulations with differ- ent configurations of orbital forcing. Orbital forcing is particularly important in the high latitudes and for proxies that may record seasonal signatures (e.g. due to recording growing season temperatures). As such, there may be significant biases in the dataset, as the temporal ranges of the proxies include periods with substantially different external forcing than during the KM5c time slice for which the simulations are run. Feng et al. (2017) investigated the effects of different orbital configurations, as well as elevated atmospheric CO 2 concentrations (+50 ppm) and closed Arctic gateways in PlioMIP1 simulations, and found that they may change the outcomes of data-model comparisons in the northern high latitudes by 1-2 • C.
Further uncertainties arise due to bioclimatic ranges of fossil assemblages, errors in pre-industrial temperatures from the observational record, potential seasonal biases, and additional unquantifiable factors. Ultimately, the uncertainties constrain our ability to evaluate the Arctic warming in the PlioMIP2 simulations substantially. A more detailed description of the uncertainties in the SAT estimates can be found in the work of Salzmann et al. (2013).
The reconstructed temperatures are differenced with temperatures from the observational record to obtain proxy temperature anomalies. Observational-record temperatures are obtained from the Berkeley Earth monthly land and ocean dataset (Rohde et al., 2013a(Rohde et al., , 2013b, and the average temperature in the 1870-1899 period was used. Furthermore, the simulation of mPWP SIE will be evaluated using three palaeoenvironmental reconstructions that indicate whether sea ice was perennial or seasonal at a specific location. Darby (2008) infers that perennial sea ice was present at Lomonosov Ridge (87.5 • N, 138.3 • W) throughout the last 14 Myr based on estimates of drift rates of sea ice combined with inferred circum-Arctic sources of detrital mineral grains in sediments at this location. Knies et al. (2014) infer seasonal sea ice cover based on the abundance of the IP 25 biomarker, a lipid that is produced by certain sea ice diatoms, which is similar to the modern summer minimum throughout the mid-Pliocene in sediments at two locations near the Fram Strait, of which one is chosen for this data-model comparison (80.2 • N, 6.4 • E). Similarly, Clotten et al. (2018) infer seasonal sea ice cover with occasional sea- ice-free conditions in the Iceland Sea (69.1 • N, −12.4 • E) between 3.5 and 3.0 Ma using a multiproxy approach. As the sediment record studied by Clotten et al. (2018) included a peak in the abundance of the IP 25 biomarker at 3.2 Ma, we infer seasonal sea ice cover during the KM5c time slice.

Annual mean warming
The PlioMIP2 experiments show substantial increases in global annual mean SAT (ranging from 1.7 to 5.2 • C, with a MMM of 3.2 • C; Fig. 1a; Table S2) and SST (ranging from 0.8 to 3.9 • C, with a MMM of 2.0 • C; Fig. 1c; Table S2) in the mPWP, compared to pre-industrial period.
All models show a clear Arctic amplification, with annual mean SAT in the Arctic (60-90 • N) increasing by 3.7 to 11.6 • C (MMM of 7.2 • C; Fig. 1a). The magnitude of Arctic amplification, defined as the ratio between the Arctic and global SAT anomaly, ranges from 1.8 to 3.1, and the MMM shows an Arctic amplification factor of 2.3 (Fig. 1b). There is a large variation in the magnitude of the simulated Arctic SAT anomalies, with 5 out of 16 models, namely CCSM4-Utrecht, CCSM4-UoT, CESM1.2, CESM2, and EC-Earth 3.3, all simulating much stronger anomalies than the rest of the ensemble. This subset of the ensemble raises the MMM substantially, and this has to be taken into account when interpreting the MMM results. The MMM SAT anomaly for the PlioMIP2 ensemble excluding this subset of five models is 5.8 • C.
Annual mean SST in the Arctic increased by 1.3 to 4.6 • C (MMM of 2.4 • C; Fig. 1c). Furthermore, the five models that simulated the largest Arctic SAT anomalies also simulate the largest Arctic SST anomalies. Temperature anomalies in the PlioMIP2 ensemble are similar but slightly higher than in the PlioMIP1 ensemble. A similar magnitude of Arctic amplification is simulated by the two ensemble means.
The greatest MMM SAT anomalies in the Arctic are found in the regions with reduced ice sheet extent on Greenland , which generally show warming of over 10 • C and even up to 20 • C. Additionally, temperature anomalies of over 10 • C are simulated around the Baffin Bay. SAT anomalies of around 6-9 • C are simulated over most of the Arctic Ocean regions. SST anomalies in the Arctic are strongest in the Baffin Bay and the Labrador Sea, reaching up to 7 • C (Fig. 2b).

Seasonal warming
The distinct seasonality of Arctic amplification (Serreze et al., 2009;Zheng et al., 2019) can be used to identify mechanisms causing Arctic amplification. Figure 3 depicts the seasonality of Arctic warming for each model, with monthly SAT and SST anomalies normalized by the annual mean anomaly for that specific model. The ensemble simulates a consistent peak in Arctic SST warming between July and September (Fig. 3b). This is consistent with the response that increased seasonal heat storage from incoming heat fluxes would have upon the reduction of SIE (Serreze et al., 2009;Zheng et al., 2019). Minimum SAT warming is expected in the summer because of the increased ocean heat uptake, while maximum SAT warming is expected in the autumn and winter following the release of this heat (Pithan and Mauritsen, 2014;Serreze et al., 2009;Yoshimori and Suzuki, 2019;Zheng et al., 2019). This is not simulated by all models, however (Fig. 3a). COSMOS, GISS-E2-1-G, IPSL-CM6A-LR, and MRI-CGCM2.3 all do  show this autumn and winter amplification of annual mean SAT anomalies and decreased warming in the summer. Decreased summer warming is simulated by CCSM4-Utrecht, EC-Earth 3.3, and IPSLCM5A in combination with autumn amplification and by CESM2 and NorESM1-F in combination with winter amplification. All other models in the ensemble do not show an autumn or winter amplification in combination with decreased summer warming, suggesting a more limited role of reductions in SIE underlying the seasonal cycle of Arctic SAT anomalies.

Annual mean sea ice extent
The MMM of Arctic annual SIE (sea ice concentration ≥ 0.15) is 11.9 × 10 6 km 2 for the pre-industrial simulations, and 5.6 × 10 6 km 2 (a 53 % decrease) for the mPWP simulations. The pre-industrial annual mean SIE ranges from 9.1 to 15.6 × 10 6 km 2 in the ensemble, while the mPWP SIE ranges from 2.3 to 10.4 × 10 6 km 2 . The decrease in SIE between individual simulations ranges from −3.0 × 10 6 km 2 to −10.4 × 10 6 km 2 (Table S2). Interestingly, the PlioMIP1 MMM shows larger SIEs in both the pre-industrial and the mPWP simulations than any individual model in the PlioMIP2 ensemble (Fig. 4). The 53 % MMM decrease in SIE simulated by the PlioMIP2 ensemble is substantially greater than the 33 % MMM decrease in SIE simulated by the PlioMIP1 ensemble (Howell et al., 2016a).

Monthly mean sea ice extent
The seasonal cycle of SIE anomalies is depicted in Fig. 5a. Reductions in SIE are slightly greater in the autumn (September-November) compared to other seasons for the MMM. There is, however, no consistent response in the seasonal character of SIE anomalies in the PlioMIP2 ensemble. CCSM4-UoT, CESM2, IPSLCM5A, and IPSLCM5A-2.1 simulate the largest reductions in SIE in winter (December-February), while GISS-E2-1-G and HadCM3 simulate the largest SIE reductions in spring. The remaining 10 models simulate the greatest SIE anomalies in autumn.
A more consistent response is observed when comparing monthly mean mPWP SIEs and pre-industrial SIEs. For each model, the largest reductions in SIE in terms of percentages occur between August and October (Fig. 5b). This may be explained by the lesser amount of energy that is needed to melt a given percentage of the smaller SIE that is present in the summer compared to winter. A total of 11 out of 16 models simulate sea-ice-free conditions (SIE < 1 × 10 6 km 2 ) in at least 1 month, while five models (GISS-E2-1-G, IP-SLCM5A, IPSLCM5A-2.1, MRI-CGCM2.3, and NorESM-L) do not (Fig. 5b). The NorESM1-F simulation simulates the smallest global mean warming (1.7 • C; Fig. 1a) resulting in Arctic sea-ice-free conditions.

Sea ice and Arctic warming
There is a strong anti-correlation between annual mean Arctic SAT and SIE anomalies (R = −0.79; Fig. 6a), as well as between SST and SIE anomalies (R = −0.79; Fig. 6b). These anti-correlations are stronger than those found for the PlioMIP1 ensemble (R = −0.76, R = −0.73, respectively; Howell et al., 2016a).

Results
To evaluate the ability of the PlioMIP2 ensemble to simulate Arctic warming, we perform a data-model comparison with the available SAT reconstructions for the mPWP. The datamodel comparison hints at a substantial mismatch between models and temperature reconstructions. Mean absolute deviations (MAD) range from 5.0 to 11.2 • C (Table S3), with a MAD of 7.3 • C for the MMM. The median bias ranges from −2.0 to −13.1 • C, with a median bias of −8.2 • C for the MMM (Table S3). The PlioMIP2 MMM shows slightly improved agreement with the SAT reconstructions compared to the PlioMIP1 MMM (MAD = 7.8 • C, median bias = −8.7 • C). Figure 7 depicts the deviation from reconstructions for the MMM. Underestimations range from −17 to −2.5 • C, while at two sites in the Canadian Archipelago (80 • N, 85 • W and 79.85 • N, 99.24 • W) the MMM overestimates the reconstructed temperatures (by 2.7 and 1.2 • C, respectively). It has to be noted, however, that SAT anomalies are underestimated at three other sites within the Canadian Archipelago. Given the resolution of global climate models and the close proximity of the sites, it may be impossible for simulations to match all five of these SAT estimates.
The deviation from reconstructions for each model and the PlioMIP2 and PlioMIP1 MMMs is represented by the box and whisker plots in Fig. 8. A consistent underestimation of the temperature estimates from SAT reconstructions is present in the PlioMIP2 ensemble. CESM2 simulates the smallest deviations from reconstructions in the ensemble, with a MAD of 5.0 • C and a median bias of −2 • C. The five models that simulated the highest Arctic SAT anomalies (CCSM4-Utrecht, CCSM4-UoT, CESM1.2, CESM2, and EC-Earth 3.3) simulate the lowest median biases, indicating that the upper end of the range of simulated Arctic SAT anomalies in the PlioMIP2 ensemble tends to better match the proxy dataset in its current form. Future research into the underlying mechanisms for the increased Arctic warming in these 5 simulations, compared to the remaining 11 simulations in the ensemble, may form a way to uncover factors that contribute to improved data-model agreement.

Uncertainties
Some of the data-model discord may be caused by uncertainties in the temperature estimates (Table S1; Salzmann et al.,   (Salzmann et al., 2013). Data markers for reconstructions in close proximity of each other have been slightly shifted for improved visibility. 2013). To investigate how these uncertainties may have affected the outcomes of the data-model comparison, we construct a maximum uncertainty range. This range spans from the highest possible temperature within uncertainty and the lowest possible temperature within uncertainty. The uncertainties for the temperature estimates were taken from the compilation of mPWP Arctic SAT estimates from   (Table S1).  Figure 9 depicts the locations for which at least one model in the ensemble simulates a temperature within the maximum available uncertainty range of a reconstruction. For 6 out of the 12 reconstructions that included an uncertainty estimate, the models in the PlioMIP2 ensemble simulate temperatures that are within the uncertainty range (Fig. 9). Additionally, both overestimations and underestimations are present for the Magadan District reconstruction for which no uncertainty estimate is available (60 • N, 150.65 • E, Table S1), implying that the reconstruction falls within the range of simulated temperatures in the PlioMIP2 ensemble. For the remaining six reconstructions, including several which are assessed at high or very high confidence (Fig. 9), no model simulates temperatures within the uncertainty range.
Ultimately, when considering the full uncertainty ranges of the reconstructions, it becomes evident that solely reducing potential errors in SAT estimates would not fully resolve the Figure 9. Blue circles highlight where at least one model in the ensemble simulates a temperature that falls within the uncertainty range of the reconstruction. The size of SAT reconstructions is scaled by qualitatively assessed confidence levels (Salzmann et al., 2013). Data markers for reconstructions in close proximity of each other have been slightly shifted for improved visibility. data-model discord for several locations in the Arctic. It is thus likely that other sources of error contribute to the datamodel discord, such as uncertainties in model physics (e.g. Feng et al., 2019;Howell et al., 2016b;Sagoo and Storelvmo, 2017;Unger and Yue, 2014) and boundary conditions (e.g. Brierley and Fedorov, 2016;Feng et al., 2017;Hill, 2015;Howell et al., 2016b;Otto-Bliesner et al., 2017;Prescott et al., 2014;Robinson et al., 2011;Salzmann et al., 2013). The focus on the KM5c time slice has helped resolve some of the data-model discord that was present in the North Atlantic for SST , and similar work for SAT reconstructions may thus be beneficial. However, this may not always be possible given the lack of precise dating and chronologies available. It is at this moment unclear whether the underestimation of Arctic SAT is specific to the mid-Pliocene, through uncertainties in reconstructions or boundary conditions, or an indicator of common errors in model physics.

Evaluation of sea ice
The limited availability of proxy evidence (three reconstructions) severely limits our ability to evaluate the simulation of mPWP sea ice in PlioMIP2 simulations. Nevertheless, a data-model comparison is still worthwhile, as the few reconstructions that are available may form an interesting outof-sample test for the simulation of sea ice in the PlioMIP2 models. Figure 10a depicts the number of models per grid box that simulate perennial sea ice. Six models simulate the inferred perennial sea ice (mean sea ice concentration ≥ 0.15 in each month) at Lomonosov Ridge (87.5 • N, 138.3 • W; Darby, 2008), while the remaining 10 simulate sea-ice-free conditions in at least 1 month per year at this site. The majority of the models simulate a maximum SIE that extends, or nearly extends, into the Fram Strait and Iceland Sea (Fig. 10b) in at least 1 month (in winter) per year (Fig. 10b), consistent with proxy evidence (Clotten et al., 2018;Knies et al., 2014).
The uncertainties in both the SAT and SIE reconstructions are large, and it may not be possible to match both datasets in their current forms. This would require increased Arctic annual terrestrial warming compared to the mean model (Sect. 5.1) as well as perennial sea in the summer and a large SIE in winter (extending at least into the Iceland Sea). Moreover, McClymont et al. (2020) found that the warmest model values in the PlioMIP2 ensemble tend to align best with North Atlantic SST reconstructions, further indicating that strong Arctic warming is required for data-model agreement. If there was no perennial sea ice in the mPWP like most models in the PlioMIP2 ensemble, the different proxy records may be more compatible, but this would be in disagreement with findings from Darby (2008). The CCSM4-Utrecht model, which simulated a relatively high Arctic SAT anomaly (10.5 • C; Fig. 1a) and low median bias (−4 • C) in the point-to-point SAT data-model comparison compared to the rest of the ensemble, simulates a maximum winter SIE that extends both into the Fram Strait and Iceland Sea. This highlights that models with higher Arctic SAT anomalies and better SAT data-model agreement can still match both seasonal sea ice proxies. Ultimately, more reconstructions of sea ice are needed for a more robust evaluation of mPWP sea ice and Arctic warming in general.

Comparison to future climates
Research into the mPWP is often motivated by a desire to understand future climate change (Burke et al., 2018;Haywood et al., 2016;Tierney et al., 2019). Here, we analyse how the mPWP may teach us about future Arctic warming by comparing two climatic features of the mPWP simulations to simulations of future climate. The climatic features include Arctic amplification and a feature for which there is some proxy evidence available that may also aid in model evaluation: the AMOC.

Arctic amplification
A linear relationship between global and Arctic temperature anomalies is present in the PlioMIP2 ensemble (R = 0.93, Fig. 11a). This is consistent with findings from multimodel analyses of other climates (Bracegirdle and Stephenson, 2013;Harrison et al., 2015;Izumi et al., 2013;Masson-Delmotte et al., 2006;Miller et al., 2010;Schmidt et al., 2014;Winton, 2008) and indicates that global temperature anomalies are a good index for Arctic SAT anomalies in mPWP simulations.
For four ensembles of future climate simulations, from the previous phase of the Coupled Model Intercomparison Project (CMIP), CMIP5, data for MMM Arctic (defined there as 67.5-90 • N) temperature anomalies are available (Masson-Delmotte et al., 2013; Table S4). The PlioMIP2 MMM shows global warming that falls between the RCP6.0 and RCP8.5 MMMs in terms of magnitude (Fig. 11b). Even though PlioMIP underestimates mPWP SAT reconstructions (Sect. 5.1), the simulations do simulate stronger Arctic temperature anomalies per degree of global warming compared to future climate ensembles (Fig. 11b). The future climate ensemble MMMs simulate end-of-century (2081-2100) average Arctic (67.5-90 • N) amplification ratios that range from 2.2 to 2.4, while PlioMIP2 and PlioMIP1 simulate mean ratios of 2.8 and 2.7, respectively (Table S4).
The increased Arctic warming per degree of global warming indicates that apart from warming through changes in atmospheric CO 2 concentrations, which is the dominant mech-anism for warming in both ensembles, different or additional mechanisms underly the simulated mPWP Arctic warming compared to the future climate simulations. The difference between the PlioMIP2 and future climate ensembles may be explained by slow responses to changes in forcings that fully manifest in equilibrium climate simulations, such as the response to reduced ice sheets, but not in transient, nearfuture climate simulations. Additional Arctic warming in the mPWP simulations may arise due to the changes in orography (Brierley and Fedorov, 2016;Feng et al., 2017;Haywood et al., 2016;Otto-Bliesner et al., 2017), ice sheets, and vegetation in the boundary conditions (Hill, 2015;Lunt et al., 2012b).
Using PlioMIP2 simulations for potential lessons about future warming may be improved by isolating the effects of the changes in orograph. Similar changes in ice sheets and vegetation may occur in future equilibrium warm climates, but the changes in orography are definitively non-analogous to future warming. Several groups isolated the effects of the changed orography on global warming in PlioMIP2 simulations and found that it contributes, respectively, around 23 % (IPSL6-CM6A-LR; Tan et al., 2020), 27 % (COS-MOS; Stepanek et al., 2020), and 41 % (CCSM4-UoT; Chandan and Peltier, 2018) to the annual mean global warming in the mPWP simulations. Furthermore, this warming was strongest in the high latitudes (Chandan and Peltier, 2018; indicating that the additional Arctic warming in PlioMIP2 simulations, as compared to future climate simulations, are likely partially caused by changes in orography that are non-analogous with the modern-day orography. These findings highlight the caution that has to be taken when using palaeoclimate simulations as analogues for future climate change.

Atlantic meridional overturning circulation
The AMOC, a major oceanic current transporting heat into the Arctic (Mahajan et al., 2011), is inferred to have been significantly stronger in the mPWP compared to pre-industrial values based on proxy evidence (Dowsett et al., 2009;Frank et al., 2002;Frenz et al., 2006;McKay et al., 2012;Ravelo and Andreasen, 2000;Raymo et al., 1996). An analysis of AMOC changes in PlioMIP2 simulations shows that, indeed, the maximum AMOC strength increases: by 4 % to 53 % ( Fig. 12; Table S2: Z. . The closure of the Arctic Ocean gateways, in particular the Bering Strait, likely contributed to the increase in AMOC strength (Brierley and Fedorov, 2016;Feng et al., 2017;Haywood et al., 2016;Otto-Bliesner et al., 2017).
Strengthening of the AMOC contrasts projections of future changes by CMIP5 models that predict a weakening of the AMOC over the 21st century, with best estimates ranging from 11 % to 34 % depending on the chosen future emission scenario (Collins et al., 2013). These opposing responses may help explain some of the additional Arctic warming that is observed in the PlioMIP2 ensemble compared to the future climate ensembles (Fig. 11b).
The strengthening of the AMOC in the PlioMIP2 ensemble is consistent with the additional 0.4 • C increase in SST warming in the Arctic (Fig. 1c) and the better datamodel agreement in the North Atlantic that is observed for the PlioMIP2 MMM Haywood et al., 2020;McClymont et al., 2020) compared to the PlioMIP1 MMM (Fig. 1c), which did not show any substantial changes in AMOC strength compared to pre-industrial values .

Conclusions
The PlioMIP2 ensemble simulates substantial Arctic warming and 11 out of 16 models simulate summer sea-ice-free conditions. Comparisons to reconstructions show, however, that the ensemble tends to underestimate the available reconstructions of SAT in the Arctic, although large differences in the degree of underestimation exist between the simulations. The models that simulate the largest Arctic SAT anomalies tend to match the reconstructions better, and investigation into the mechanisms underlying the increased Arctic warming in these simulations may help uncover factors that could contribute to improved data-model agreement. We find that, while some of the SAT data-model discord may be resolved by reducing uncertainties in proxies, additional improvements are likely to be found in reducing uncertainties in boundary conditions or model physics. Furthermore, there is some agreement with reconstructions of sea ice in the ensemble, especially for seasonal sea ice. The limited availability of proxy evidence and the uncertainties associated with them severely constrain the compatibility of the different proxy datasets and our ability to evaluate the Arctic warming in PlioMIP2. Increased proxy evidence of different climatic variables and additional sensitivity experiments, among other goals, are needed for a more robust evaluation of Arctic warming in the mPWP. Lastly, we find differences in Arctic climate features between the PlioMIP2 ensemble and future climate ensembles that include the magnitude of Arctic amplification and changes in AMOC strength. These differences highlight that caution has to be taken when attempting to use simulations of the mPWP to learn about future climate change.
Data availability. The reconstructions used in this study are available in the Supplement. The model data can be downloaded from PlioMIP2 data server located at the School of Earth and Environment of the University of Leeds, an email can be sent to Alan Haywood (a.m.haywood@leeds.ac.uk) for access. At the time of publication, the data from CESM2, EC-EARTH3.3, GISS-E2-1-G, IPSL-CM6A-LR, and NorESM1-F can be downloaded through CMIP Search Interface at https://esgf-node.llnl.gov/search/cmip6, last access: 17 November 2020; Department of Energy, 2020.
Author contributions. QZ and WdN designed the work. WdN did the analyses and wrote the manuscript under supervision from QZ. QL and QZ performed the simulations with EC-Earth3. XL and ZZ provided input on AMOC analysis. HJD provided the input on reconstructions. All the other co-authors provided the PlioMIP2 model data and commented on the manuscript.
Competing interests. The authors declare that they have no conflict of interest.

Special issue statement.
This article is part of the special issue "PlioMIP Phase 2: experimental design, implementation and scientific results". It is not associated with a conference.
Acknowledgements. The EC-Earth3 simulations are performed on the Swedish National Infrastructure for Computing (SNIC) at the National Supercomputer Centre (NSC). COSMOS PlioMIP2 simulations have been conducted at the Computing and Data Center of the Alfred-Wegener-Institut Helmholtz-Zentrum für Polar und Meeresforschung on a NEC SX-ACE high-performance vector computer. Gerrit Lohmann and Christian Stepanek acknowledge funding via the Alfred Wegener Institute's research programme PACES2. Christian Stepanek acknowledges funding by the Helmholtz Climate Initiative REKLIM. Camille Contoux and Gilles Ramstein thank ANR HADOC ANR-17-CE31-0010; the authors were granted access to the HPC resources of TGCC under the allocations 2016-A0030107732, 2017-R0040110492, 2018-R0040110492 (gencmip6), and 2019-A0050102212 (gen2212) provided by GENCI. The IPSL-CM6 team of the IPSL Climate Modelling Centre (https://cmc.ipsl.fr, last access: 17 November 2020) is acknowledged for having developed, tested, evaluated, and tuned the IPSL climate model, as well as having performed and published the CMIP6 experiments. Wing-Le Chan and Ayako Abe-Ouchi acknowledge funding from JSPS KAKENHI grant 17H06104 and MEXT KAKENHI grant 17H06323 and also acknowledge JAM-STEC for the use of the Earth Simulator supercomputer. The PRISM4 reconstruction and boundary conditions used in PlioMIP2 were funded by the U.S. Geological Survey Climate and Land Use Change Research and Development Program. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
The article processing charges for this open-access publication were covered by Stockholm University.
Review statement. This paper was edited by Alessio Rovere and reviewed by two anonymous referees.