Lessons from a high CO 2 world : an ocean view from ~ 3 million years ago

A range of future climate scenarios are projected for high atmospheric CO2 concentrations, given uncertainties over future human actions as well as potential environmental and climatic feedbacks. The geological record offers an opportunity to understand climate system response to a range of forcings and feedbacks which operate over multiple temporal and spatial scales. Here, we examine a single interglacial during the late Pliocene 55 (KM5c, ca. 3.205 +/0.01 Ma) when atmospheric CO2 concentrations were higher than pre-industrial, but similar to today and to the lowest emission scenarios for this century. As orbital forcing and continental configurations were almost identical to today, we are able to focus on equilibrium climate system response to modern and nearfuture CO2. Using proxy data from 32 sites, we demonstrate that global mean sea-surface temperatures were warmer than pre-industrial, by ~2.3 oC for the combined proxy data (foraminifera Mg/Ca and alkenones), or by 60 ~3.2-3.4oC (alkenones only). Compared to the pre-industrial, reduced meridional gradients and enhanced warming in the North Atlantic are consistently reconstructed. There is broad agreement between data and models at the global scale, with regional differences reflecting ocean circulation and/or proxy signals. An uneven distribution of proxy data in time and space does, however, add uncertainty to our anomaly calculations. The reconstructed global mean sea-surface temperature anomaly for KM5c is warmer than all but three of the PlioMIP2 model 65 outputs, and the reconstructed North Atlantic data tend to align with the warmest KM5c model values. Our results demonstrate that even under low CO2 emission scenarios, surface ocean warming may be expected to exceed model projections, and will be accentuated in the higher latitudes.

In this comment we were not seeking to explain the difference between the data and models in Figure 3 (since both are SST), but to flag that since our SST proxy reconstructions consistently gave values which were higher than the model SST data, then a larger warming in global annual surface air temperature might also be expected. The sentences which preceded this one were discussing other modelling studies which make comparisons between Pliocene and projected global annual surface air temperatures; we had wanted to flag here that we are showing SST anomalies (not surface air temperatures) and that the latter would likely be larger. However, we recognise that this became unclear in the text and have revised accordingly (lines 309-313 below): Our proxy-based mean global SST anomaly is larger than in most PlioMIP2 models when we use alkenones-only or alkenones and Mg/Ca combined (Figure 3), and hence our results suggest that the global annual surface air temperature anomaly for KM5c would exceed the PlioMIP2 multi-model surface air temperature mean of 2.8°C (Haywood et al., in review, 2020).

Introduction
By the end of this century, projected atmospheric CO2 concentrations range from 430 to >1000 ppmv depending 70 upon future emission scenarios (IPCC, 2013). At the current rate of emissions, global mean temperatures are projected to exceed 1.5ºC and 2ºC above pre-industrial values in 10 and 20 years, respectively, passing the targets set by the Paris Agreement (IPCC, 2018). The geological record affords an opportunity to explore key global and regional climate responses to different atmospheric CO2 concentrations, including those which extend beyond centennial timescales (Fischer et al., 2018). Palaeoclimate models indicate that climates last experienced during 75 the mid-Piacenzian stage of the Pliocene (3.1-3.3 Ma) will be surpassed by 2030 CE under high emission scenarios (Representative Concentration Pathway, RCP8.5), or will develop by 2040 CE and be sustained thereafter under more moderate emissions (RCP4.5, Burke et al., 2018).
The late Pliocene thus provides a geological analogue for climate response to moderate CO2 emissions. However, 80 the magnitude of tropical ocean warming differs between proxy reconstructions (e.g. Zhang et al., 2014;O'Brien et al., 2014;Ford and Ravelo, 2019;Tierney et al., 2019a), and stronger polar amplification has been consistently recorded in proxy data compared to models (Haywood et al., 2013;Haywood et al., 2016a). Some of the disagreements may reflect non-thermal influences on temperature proxies (e.g. secular evolution of seawater Mg/Ca, Medina-Elizalde et al., 2008;Evans et al., 2016) and/or seasonality in the recorded signals (e.g. Tierney

85
and Tingley, 2018). It has also been proposed that previous approaches to integrating Pliocene sea-surface temperature (SST) data may have introduced bias to data-model comparison (Haywood et al., 2013). For example, the Pliocene Research Interpretation and Synoptic Mapping (PRISM) project generated warm peak averages within specified time windows (Figure 1) (outlined in Dowsett et al., 2016 and references therein). However, by integrating multiple warm peaks within the 3.1-3.3 Ma mid-Piacenzian data synthesis windows (Figure 1), 90 regional and time-transgressive responses to orbital forcing (Prescott et al., 2014;Fischer et al., 2018;Hoffman et al., 2017;Feng et al., 2017) are potentially recorded in the proxy data, which may not align with the more narrowly-defined time interval being modelled (Haywood et al., 2013;Dowsett et al., 2016).
Here, we present a new, globally-distributed synthesis of SST data for the mid-Piacenzian stage, addressing two 95 concerns. First, we minimise the impact of orbital forcing on regional and global climate signals by synthesising data from a specific interglacial stage, a 20-kyr time slice centred on 3.205 Ma (KM5c; see Figure 1). At 3.205 Ma, both seasonal and regional distributions of incoming insolation are close to modern, making this time an important analogue for 21 st century climate (Haywood et al., 2013). The low variability in orbital forcing through KM5c minimises the potential for time-transgressive regional signals to be a feature of the geological data 100 (Haywood et al., 2013;Prescott et al., 2014). Second, we provide a range of estimates from different SST proxies, taking into consideration the uncertainties in proxy-to-temperature calibrations and/or secular processes that may bias proxy estimates. This synthesis is possible due to robust stratigraphic constraints placed on the datasets by the PAGES-PlioVAR working group (see Methods).

The KM5c Interglacial
KM5c (also referred to as KM5.3) is an interglacial centred on a ~100 kyr window of relatively depleted benthic 18 O values, that immediately follows a pronounced  18 O peak during the glacial stage M2 (3.3 Ma; Figure 1).
Minor changes to orbital forcing during KM5 enables a wider target zone (3.205 Ma +/-20 kyr) for data collection, because the potential for orbitally-forced regional and time-transgressive climate signals is minimised (Haywood 110 et al., 2013). A comparable approach has been adopted by the PRISM4 synthesis (3.190 to 3.220 Ma, Foley and Dowsett, 2019, see Figure 1). Here, we focus on a narrow time slice of 3.195-3.215 Ma, to span approximately one precession cycle. The reconstructed atmospheric CO2 concentrations from boron isotopes in KM5c are 360 ± 55 ppmv (for median boron-derived values (n=3), full range: 289-502 ppmv, Figure 1; Foster et al., 2017). A wider range of atmospheric CO2 concentrations has been reconstructed for the whole mid-Piacenzian stage (356 115 ± 65 ppmv for median values (n=36), full range: 185-592 ppmv, Foster et al., 2017).

Age models
The PAGES-PlioVAR working group agreed on a set of stratigraphic protocols to maximise confidence in the identification and analysis of orbital-scale variability within the mid-Piacenzian stage (McClymont et al., 2017).
Sites were only included in the synthesis if they had either (i) ≤10 kyr resolution benthic  18 O data which could 120 be (or had been) tied to the LR04 stack (Lisiecki and Raymo, 2005) or the HMM-Stack ; and/or (ii) the palaeomagnetic tie-points for Mammoth top (C2An.2n (b) at 3.22 Ma) and Mammoth bottom (C2An.3n (t) at 3.33 Ma). At one site (ODP Site 1090) these conditions were not met (see Supplement), but tuning to LR04 had been made using a record of dust concentrations under the assumption that higher dust flux occurred during glacials as observed during the Pleistocene . At ODP Site 806, uncertainty over age 125 control resulted from the absence of an agreed splice across the multiple holes drilled by ODP, and a new age model has been constructed (see Supplement). For some sites (see online summary at https://pliovar.github.io/km5c.html), revisions to the published age model were made, for example if the original data had been published prior to the LR04 stack (Lisiecki and Raymo, 2005) or prior to revisions to the palaeomagnetic timescale (Gradstein et al., 2012). In total, data from 32 sites was compiled, extending from 46°S 130 to 69°N (Figure 2).

Proxy sea-surface temperature (SST) data
A multi-proxy approach was taken, to maximise the information available on changing climates and environments during the KM5c interval. Two SST proxies were analysed: the alkenone-derived U K 37' index (Prahl and Wakeham, 1987) and foraminifera calcite Mg/Ca (Delaney et al., 1985). Both proxies have several calibrations to 135 modern SST: here we explore the impact of calibration choice on KM5c SST data, by comparing and contrasting outputs between proxies and between calibrations. Although the TEX86 proxy (Schouten et al., 2002) has also been used to generate mid-Piacenzian SSTs (e.g. O'Brien et al., 2014;Rommerskirchen et al., 2011), this data is not included here because it could not be confidently assigned to the KM5c interval either due to low sampling resolution and/or our age control protocol was not met.

Alkenone SSTs (the U K 37' index)
The majority of the 23 alkenone-derived sea-surface temperature (SST) datasets included in the PlioVAR synthesis used the U K 37' index, and applied the linear core-top calibration (60°S-60°N)   (hereafter Müller98; Tables S2 and S3). The Müller98 calibration applies the best fit between core-top U K 37' and 145 modern SSTs, recorded at the sea surface (0 m water depth) and consistent with haptophyte productivity in the photic zone. The sedimentary signal is proposed to record annual mean SST based on linear regression . Cultures of one of the dominant haptophytes, Emiliania huxleyi, generated only minor differences in the slope of the U K 37'-temperature relationship (Table S2), where growth temperature was used for calibration . Several PlioVAR datasets were originally published using the  calibration 150 (Table S3).
A recent expansion of the global core-top database (<70°N) was accompanied by Bayesian statistical analysis to assess the relationship(s) between predicted (from U K 37') and recorded ocean temperatures . The revised U K 37' calibration, BAYSPLINE, addresses non-linearity in the U K 37'-SST relationship at the 155 high end of the calibration, i.e. in the low-latitude oceans (Pelejero and Calvo, 2003;Sonzogni et al., 1997).
BAYSPLINE also highlights scatter between predicted and observed SSTs at the high latitudes, and explicitly reconstructs seasonal SSTs >45°N (Pacific) and >48°N (Atlantic), and in the Mediterranean Sea .

160
To test the impact of different alkenone temperature calibrations on the quantification of mid-Piacenzian SSTs, we converted all U K 37' data to SSTs using both the Müller98 calibration and the BAYSPLINE calibration. For most sites, BAYSPLINE was run with the recommended setting for the prior standard deviation scalar (pstd) of 10 . At high U K 37' values (above ~24°C) it is recommended to use the more restrictive value of 5, to minimise the possibility of generating unrealistic SSTs (e.g. >40°C) given that the slope of the U K 37'-165 temperature calibration becomes attenuated .

Foraminifera Mg/Ca
The magnesium-to-calcium ratio of foraminifera calcite can be used to reconstruct sea surface (surface dwelling), thermocline (subsurface dwelling) and deep (benthic) ocean temperatures (Delaney et al., 1985;Elderfield et al., 170 1996;Rosenthal et al., 1997). The PlioVAR dataset includes analysis from 12 sites, on surface-dwelling foraminifera Globigerinoides ruber, Trilobatus sacculifer, and Globigerina bulloides (Table S4). In the original publications, data were converted to SST using a range of calibrations as well as corrections for CaCO3 dissolution in the water column and sediments, which leads to preferential removal of Mg from the CaCO3 lattice (generating cooler SSTs than expected; Regenberg et al., 2009). Evolution of the 175 Mg/Ca of seawater (Mg/Caseawater) over geological timescales may also impact Mg/Ca-based palaeo-temperature reconstructions (Brennan et al., 2013;Coggon et al., 2010;Fantle and DePaolo, 2005;Gothmann et al., 2015;Horita et al., 2002;Lowenstein et al., 2001). Changes in Mg/Caseawater impacts the intercept and potentially the sensitivity of palaeotemperature equations (Evans and Müller, 2012;Medina-Elizalde and Lea, 2010)

185
given that it (i) fitted modern (regional) core-top values; (ii) accounted for known environmental impacts (e.g.
[CO3 2-] correction); (iii) was developed within a particular research group; and/or (iv) fitted conventional wisdom at the time. BAYMAG uses a Bayesian approach that incorporates laboratory culture and core-top information to generate probabilistic estimates of past temperatures. BAYMAG assumes a sensitivity of Mg/Ca to salinity, pH and saturation state at each core site, and also accounts for Mg/Caseawater evolution through a linear scaling (i.e.

190
there is no change in the sensitivity of the palaeo-temperature equation as Mg/Caseawater evolves) (Tierney et al., 2019b). For each site with Mg/Ca data, we computed SSTs using BAYMAG's species-specific hierarchical model. In the absence of knowledge concerning changes in salinity, pH, and saturation state in the Pliocene, we assumed that these values were the same as today. We drew seasonal sea-surface salinity from the World Ocean Atlas 2013 product (Boyer et al., 2013), and pH and bottom water saturation state from the GLODAPv2 product 195 (Lauvset et al., 2016;Olsen et al., 2016). We used a prior standard deviation of 6˚C for all sites.

Climate models
The model outputs used here were generated from the 15 models that contribute to the Pliocene modelling intercomparison project, Phase 2 (PlioMIP2) (Haywood et al., in review, 2020). The boundary conditions for the 200 experiments, and their large-scale results for Pliocene and pre-industrial climates are detailed elsewhere (Haywood et al., in review, 2020;Haywood et al., 2016b), so are briefly outlined here.
The Pliocene simulations are intended to represent KM5c (~3.205Ma) and were forced with PRISM4 boundary conditions (Haywood et al., 2016b). Atmospheric CO2 concentration was set at 400 ppmv (Haywood et al., in 205 review, 2020), in line with the upper estimates of atmospheric CO2 from boron isotope data ( Figure 1; Foster et al., 2017). Lower estimates from the alkenone carbon isotope proxy ( Figure 1) are likely to reflect an insensitivity of this proxy to atmospheric CO2 in the Pliocene (Badger et al., 2019). All other trace gases, orbital parameters and the solar constant were specified to be consistent with each model's preindustrial experiment. The Greenland Ice Sheet was confined to high elevations in the Eastern Greenland Mountains, covering an area approximately 210 25% of the present-day ice sheet. The Antarctic ice sheet has no ice over Western Antarctica. The reconstructed PRISM4 ice sheets have a total volume of 20.1 × 10 6 km 3 , equating to a sea-level increase relative to present day of less than ∼24 m (Dowsett et al., 2016).
Modelling groups had some choices regarding exact implementation of boundary conditions, however 14 of the 215 15 models used the 'enhanced' PRISM4 boundary conditions (Dowsett et al., 2016) which included all reconstructed changes to the land/sea mask and ocean bathymetry. Key ocean gateway changes relative to modern are closure of the Bering Strait and Canadian archipelago. and exposure of the Sunda and Sahul shelves (Dowsett et al., 2016). Initialisation of the experiments varied between models (Haywood et al., in review, 2020). Some models were initialised from a pre-industrial state while others were initialised from the end of a previous Pliocene 220 simulation or another warm state. The simulations reached equilibrium towards the end of the runs as per PlioMIP2 protocol.

Statistical analysis (calculating of global means / gradients)
For all anomaly calculations we obtain pre-industrial SST from the NOAA-ERSST5 dataset for years 1870-1899 225 CE (Huang et al., 2017), ensuring alignment between the KM5c proxy data and the KM5c model experiments (Haywood et al., in review, 2020). This pre-industrial time window excludes the largest cooling linked to the Little Ice Age and pre-dates the onset of 20 th century warming (Owens et al., 2017;PAGES2k Consortium, 2017). The global mean SST anomaly from the proxy data was obtained as follows: firstly, the SST anomaly between the proxy data and the NOAA-ERSST5 data was obtained for each location, and the data collated into bins of 15° of 230 latitude. It is assumed that the average of all the data in each bin represents the average SST anomaly for that latitude band. Next, the area of the ocean surface for each bin is obtained. The average SST anomaly is then the average of all the bins weighted by the ocean area in the relevant latitude band.
Meridional gradients were obtained in a similar way. A low-latitude SST anomaly was obtained as the weighted 235 average of all the bins containing low-latitude SSTs (for example the 4 × 15° bins containing latitudes of 30°S -30°N). A high-latitude SST anomaly was obtained as the weighted average of all bins containing high latitude SSTs (>60°N, because there were no proxy data points >60°S). As only the Atlantic Ocean contained data points poleward of 65°N, the high latitude region used in the gradient calculations for both proxies and models was focused on the longitudinal window from 70°W-5°E. The meridional gradient SST anomaly is then the low-240 latitude SST anomaly minus the high-latitude SST anomaly, relative to the pre-industrial.
There are some uncertainties in this calculation of the global mean SST, in particular, the fact that the proxy data is not evenly distributed throughout a latitude bin, and also that some bins contain very few data points. There is a higher density of data in the Atlantic Ocean, compared to the Indian Ocean and Pacific Ocean, and no high-latitude data is available to consider a Southern Ocean response ( Figure 2). Nevertheless, this method of calculating averages does attempt to account for unevenly distributed data and provides a SST anomaly (SSTA) that is comparable with model results. The impact of proxy choice was examined in the calculation of the global means and meridional SST gradients. As no Mg/Ca data were available >50°N or >30°S, we calculated global mean SST and the meridional SST gradients either including or excluding the Mg/Ca data; both results are outlined 250 below and shown in Table 1.

Results
Relative to the pre-industrial, the combined U K 37' and Mg/Ca proxy data, using the original calibrations, indicate a KM5c global mean SST anomaly of +2.3ºC and a meridional SST gradient reduced by 2.6ºC ( Figure 3). The amplitude of the global SST mean anomaly in the combined proxy data exceeds those indicated in 10 of the 255 PlioMIP2 models, but is lower than the global SST anomaly from six models. The meridional temperature gradient anomalies are more comparable (Figure 3). If only the U K 37' data is used, the global mean SST anomaly from proxies is higher than all but three of the PlioMIP2 models, and the U K 37' meridional gradient calculations are smaller than all models (BAYSPLINE) or one model (original calibration; Figure 3).

260
Overall, the proxy data show the lowest temperature anomalies in the low latitudes, regardless of proxy (from +3ºC to -4ºC for sites <30°N/S). A larger range of temperature anomalies is reconstructed in the mid-and highlatitudes (from +9ºC to -2ºC for sites >30°N/S) ( Figure 4). Thus, there is a broad, but complex, pattern of enhanced warming at the mid-and high-latitudes, reflecting a combination of regional influences on circulation patterns, and to some extent, proxy choice. This pattern is not explained by temporal variability nor sample density within 265 the KM5c time interval: regardless of sample number per site, the standard deviation at any site within the KM5c time bin is <1.5ºC ( Figure S4). We note that of the 32 sites examined here, 7 provided a single data point for the KM5c interval ( Figure S2, alkenones: ODP Sites 907, 1081, U1337, U1417; Figure S3, foraminifera Mg/Ca: DSDP Sites 214, 709, 763); the sites are geographically well distributed, however, and so unlikely to significantly impact our global mean / gradient calculations.

270
Calibration choice has a small impact over the reconstructed patterns of KM5c SST anomalies (Figure 4,and Figures S2 and S3). Below 24°C, absolute U K 37' SSTs using Müller98 are <1°C lower than those using BAYSPLINE. At high temperatures the non-linearity in the BAYSPLINE calibration means that BAYSPLINE-SSTs can be up to 1.67 °C ± 0.01°C higher than when using Müller98 ( Figure S2). The low latitude offset between

275
Müller98 and BAYSPLINE has two effects: it elevates the global mean SST ( Figure 3, Table 1) and increases the KM5c meridional SST gradient towards pre-industrial values (Figures 3 and 4, Table 1). The calibration offsets are less systematic for Mg/Ca. There is a wider range of offsets between BAYMAG and published SST values (from -4 to +5°C, Figure S3 Table S3), although the smallest KM5c SST anomalies continue to be reconstructed in the low-latitudes, regardless of which Mg/Ca calibration is applied (Figure 4).

280
Overall, the U K 37'-temperature anomalies lie within the range given by PlioMIP2 models (Figure 4). The Mg/Ca estimates are mainly from the low latitudes, and high-latitude (>60ºN/S) Mg/Ca SST data are not available to calculate meridional gradients using foraminifera data alone (Figure 4). Mg/Ca-SST anomalies are generally lower than for U K 37', and a cooler KM5c than pre-industrial is consistently (but not always) recorded in the low-latitudes 285 by Mg/Ca regardless of calibration choice ( Figure 4). As a result, combining U K 37' and Mg/Ca data leads to a cooler global mean SST (~2.3°C) than when using U K 37' alone (~3.2°C with Müller98, ~3.4°C with BAYSPLINE, Figure 3 and Table 1). At 8 sites, the negative KM5c SST anomalies in Mg/Ca disagree with both the U K 37' data and the PlioMIP2 model outputs (Figure 4) Figure 4).

300
KM5c is characterised by a surface ocean which is ~2.3°C (alkenones and Mg/Ca), ~3.2°C (alkenones-only, Müller98 calibration) or ~3.4°C (alkenones-only, BAYSPLINE calibration) warmer than pre-industrial, with a ~2.6°C reduction in the meridional SST gradient. The global mean SST anomaly is higher than the 1.7°C previously calculated for the wider mid-Piacenzian warm period (3.1-3.3 Ma), regardless of proxy choice (IPCC, 2014b). Previous analysis of a suite of models suggested that a climate state resembling the mid-Piacenzian was 305 likely to develop and be sustained under RCP4.5 (Burke et al., 2018). The PlioMIP2 ensemble (Haywood et al., in review, 2020) indicates that best estimates for mid-Piacenzian warming in surface air temperatures (1.7-5.2°) are comparable to projections for the RCP4.5 to 8.5 scenarios by 2100 CE(RCP4.5 = 1.8±0.5°, RCP8.5 = 3.7±0.7°C (IPCC 2013). Our proxy-based mean global SST anomaly is larger than in most PlioMIP2 models when we use alkenones-only or alkenones and Mg/Ca combined (Figure 3), and hence ; because air temperature 310 increases are larger over land than over the ocean in PlioMIP2 models (Haywood et al., in review, 2020), our results suggest that the global annual surface air temperature anomaly for KM5c would exceedlikely exceeds the PlioMIP2 multi-model surface air temperature mean of 2.8°C (Haywood et al., in review, 2020). The higher global SST mean recorded in the KM5c proxy data, compared to the PlioMIP2 models, occurs despite the available atmospheric CO2 reconstructions indicating values below the ~400 ppmv used in the PlioMIP2 models ( Figure   315 1). Our synthesis of SST data thus indicates that with atmospheric CO2 concentrations ≤400 ppmv (comparable to RCP4.5), the surface ocean warming response will likely be larger than indicated in models. Further work is required to increase the temporal resolution of the atmospheric CO2 reconstructions through KM5c, to improve our understanding of the reconstructed SST response to CO2 forcing, including whether (or by how much) the reconstructed atmospheric CO2 differ from model boundary conditions, and whether other changes in the model 320 boundary conditions also influence SST patterns (e.g. gateway changes outlined in section 2.4).
Proxy choice, calibration choice, and site selection, have all had an impact on the magnitude of the change in meridional SST gradient for KM5c compared to the pre-industrial (Table 1)

. Focussing only on a Northern
Hemisphere SST gradient leads to higher gradient anomalies than when all of the low-latitudes are included (30°S-325 30°N), because it excludes the high SST anomalies of the Benguela upwelling sites (20-25°S, discussed below, Figure 4). Smaller meridional SST gradient anomalies occur using BAYSPLINE (+0.03 to -1.66°C) than the Müller98 calibration for U K 37' (-1.18 to -3.00°C, Table 1), due to the increased low-latitude SST anomalies generated by BAYSPLINE (Figure 4). Due to several (but not all) low-latitude sites recording negative SST anomalies for KM5c using foraminifera Mg/Ca, the inclusion of Mg/Ca data leads to a larger difference in the 330 meridional SST gradient relative to the pre-industrial (-2.19 to -4.08°C). Further work is required to fully understand the negative KM5c SST anomalies in some of the low-latitude sites (discussed further below), given their impact on the meridional SST gradients. However, a robust pattern emerging from the data is that the KM5c proxy data detail smaller low-latitude SST anomalies than those of the mid-and high-latitudes SST anomalies ( Figure 4), leading to a reduction in the meridional SST gradient relative to the pre-industrial. Enhanced mid-and 335 high-latitude warming has been observed in other warm intervals of the geological past, including the last interglacial and the Eocene (Evans et al., 2018;Fischer et al., 2018), and is a feature of future climate under elevated CO2 concentrations (IPCC, 2014a).
There is complexity in the amplitude of the KM5c SST anomaly by latitude and basin, which may reflect patterns 340 of surface ocean circulation. In the Northern Hemisphere, relatively muted warming in the East Greenland Current (ODP Site 907, 69ºN) may reflect the presence of at least seasonal sea-ice cover from c.4.5 Ma (Clotten et al., 2018). In contrast, relatively high SST anomalies at ODP Sites 642 (67ºN) and 982 (58ºN) track northward flow of the North Atlantic Current, accounting for the enhanced warming relative to North-east Pacific IODP Site U1417 (57ºN; Figures 4 and 5). The large North Atlantic SST anomalies also contribute to an enhanced Northern

345
Hemisphere meridional SST gradient of up to 4ºC (>60ºN minus 0-30ºN; Table 1). For the Southern Hemisphere, a signal of polar amplification is less clearly identified than for the Northern Hemisphere (Figure 4)

355
Given that our proxy data meridional SST gradient calculations use only two sites to calculate the high-latitude SSTs (ODP Sites 907 and 642), which are also both from the Nordic Seas (Figure 2), we explored the impact of expanding our high-latitude band into the mid-latitudes. We also explored narrowing the low-latitude band so that it does not include the Benguela upwelling sites, which have a significant data-model offset ( Figure 4) and may 360 be influenced by localised circulation changes (see section 4.2). Previous calculations of Pliocene meridional SST gradients have also considered differences between the mid and low latitudes through time (Fedorov et al., 2015).
Despite adding 4 more sites by expanding the high-latitude band to 45°N/S, the meridional SST gradients are reduced by <0.4°C, from -1.18 to -1.56°C using the original U K 37' data (Table 1). However, it is clear from the distribution of sites (Figure 2) that our reconstructed KM5c SSTs (and thus the global mean and meridional 365 gradients) have a strong signal from the Atlantic Ocean. There is a relative scarcity of sites from the Indian Ocean, Pacific Ocean and Southern Ocean, but it is difficult to ascertain what impact this may have had on our global analysis. Further work is required to increase the spatial density of SST data for KM5c and the wider mid-Piacenzian stage, to better evaluate the magnitude of the warming and gradient changes outlined here.

Proxy data-model comparisons for mid-and high-latitude sites
For the mid-and high-latitudes, we find broad proxy data/model agreements for most sites. In the North Atlantic Ocean, reconstructed SST KM5c anomalies from U K 37' fall within the ranges provided by the PlioMIP2 models ( Figure 4) for all but one site (IODP Site U1387, 37°N). The overall U K 37'-model agreement for the North Atlantic Ocean suggests that, as proposed by Haywood et al. (2013), a focus on a specific interglacial within the mid-

375
Piacenzian provides an improved comparison to the climate being simulated by the PlioMIP2 models. Thus, some of the data-model mismatch in previous mid-Piacenzian syntheses (e.g. Dowsett et al., 2012) may have been due to the averaging of warm peaks which may not have been synchronous in time between sites and/or with the interval being modelled. Disagreements occur between proxies (North Atlantic Ocean) and between proxies and models (Benguela upwelling, Gulf of Cadiz, Mediterranean Sea) (Figure 4). Here, we explore the potential causes 380 for these offsets in turn.
The largely U K 37'-derived data from the North Atlantic Ocean tend to align with the warmest model outputs (Figure 4), and the U K 37'-SST anomalies also tend to be larger than those from Mg/Ca. A challenge for understanding the cause(s) of the U K 37'-Mg/Ca differences is that only two sites have data from both proxies, and 385 these do not show a consistent signal. There is good correspondence between U K 37' and the original published Mg/Ca SSTs for IODP Site U1313 (41°N), whereas at ODP Site 609 Mg/Ca SSTs (both calibrations) are between 4.6-6.1°C cooler than U K 37'. It has also been shown that the U K 37'-Mg/Ca SST offset at Site 609 is not constant with time for the late Pliocene . When using BAYMAG, warmer KM5c SSTs are reconstructed than the original published data at DSDP Site 603 and IODP Site U1313 (35°N and 41°N, Figure   390 4), but BAYMAG reconstructs SSTs only 0.8°C warmer than from the original published SSTs at Site 609. These mid-latitude North Atlantic Mg/Ca data are provided by G. bulloides, which may calcify at depth in the water column (e.g. Mortyn and Charles, 2003;Schiebel et al., 1997), and account for those sites where Mg/Ca reconstructions give lower reconstructed SSTs than from U K 37' (Bolton et al., 2018;. Alternatively, an offset between alkenones and Mg/Ca might be accounted for if there is a seasonal bias to the 395 U K 37' calibration (e.g. Conte et al., 2006;Schneider et al., 2010). Despite documented seasonality in alkenone production at high latitudes, it has been proposed that mean annual SSTs continue to be recorded by U K 37' in sediments (Rosell-Melé and Prahl, 2013), as indicated by the original U K 37' calibration . In contrast, BAYSPLINE explicitly assumes an autumn signal is recorded in Atlantic sites >45ºN . Despite these differences in interpretation, BAYSPLINE values for KM5c are <0.7ºC cooler than 400 the original published U K 37' data ( Figure S2). Although the North Atlantic U K 37' data align with range of mean annual SST anomalies generated by the PlioMIP2 models (Figure 4), three of the sites show alignment between U K 37'-SSTs and the July-November values from the multi-model means ( Figure 5). In contrast, Site 907 aligns with cool spring temperatures in the models, perhaps reflecting production after sea ice melt.

405
The large data-model discrepancy at 30ºS reflects 3 sites which today sit beneath the Benguela upwelling system in the South-East Atlantic (20-26°S, Figure 4). Part of the data-model discrepancy in the KM5c anomaly can be attributed to the models over-estimating pre-industrial SSTs in the northern Benguela sites (NOAA-ERSST5 SSTs are 2-5ºC below the pre-industrial model range), and suggests that models are not fully capturing the local dynamics of the coastal upwelling today (Small et al., 2015). Realistic representations of the Benguela upwelling 410 system today are proposed to require realistic wind stress curl and high-resolution atmosphere and ocean models (<1°, Small et al., 2015). Most of the PlioMIP2 simulations use lower resolution atmosphere and ocean models (Haywood et al., in review, 2020). An increased density of proxy data reconstructing KM5c atmospheric circulation, as well as application of high-resolution models, may help to understand the observed KM5c datamodel discrepancy. Furthermore, there was a deep thermocline during the Pliocene (as reconstructed in the 415 Equatorial Pacific (Ford and Ravelo, 2019;Ford et al., 2015;Steph et al., 2010) and theorized globally (Philander and Fedorov, 2003)), so that warmer subsurface waters than today were upwelled, enhancing local warming. However, warming of ~3.4ºC in subsurface waters (Ford and Ravelo, 2019) and ~2.5ºC in intermediate waters  for Pliocene interglacials suggest that Pliocene upwelling of warmer waters is unable to fully account for the 7-10ºC SST anomalies in Benguela sites for KM5c. Changes to the 420 distribution of export productivity and SSTs indicate that an overall poleward displacement of the Benguela Upwelling system occurred during the Pliocene, so that the main zone of upwelling likely sat close to ODP Site 1087 at 31ºS Petrick et al., 2018;. As the northern and southern Benguela regions are today marked by differences in the seasonality of the upwelling, a temporal shift in upwelling intensity may also account for some of the large SST anomaly (Haywood et al., in review, 2020). Thus, the data-425 model disagreement may be accounted for by a combination of displaced upwelling and warmer upwelled waters, giving large SST anomalies in Benguela proxy data, alongside the challenges of modelling both the pre-industrial and KM5c upwelling system and its associated SSTs.
Data-model disagreement also occurs at two northern hemisphere sites where U K 37'-SST anomalies exceed those 430 given by the range of model predictions (Figure 4). Punto Piccola (Sicily, 37°N) is located within the Mediterranean Sea, whereas IODP Site U1387 (37°N, Iberian margin) records the influence of the waters sourced from the Azores Current and the subtropical gyre. The data-model disagreement for KM5c reflects warmer SST estimates from the proxy data compared to the models, despite the good agreement for the pre-industrial suggesting that locally complex ocean circulation in these near-shore and marginal marine settings may have been 435 captured in the models. The data-model offset is also likely to be a minimum, because BAYSPLINE Mediterranean SSTs explicitly record November-May temperatures , and alkenone production below the sea surface has also been proposed (Ternois et al., 1997): both scenarios would act to raise mean annual SSTs further from those simulated in the PlioMIP2 models (Figure 4). Further multi-proxy investigation is required to identify whether the data-model disagreements in Benguela upwelling, Gulf of Cadiz, and Mediterranean Sea reflect challenges in modelling near-shore or complex oceanographic systems and/or biases in the temperature signal recorded by the proxy data.

Data-model comparisons for low-latitude sites
The low-latitude U K 37'-SST anomalies for KM5c align well with the PlioMIP2 models (Figure 4).  Figure S5).
However, the absolute SST values reconstructed for KM5c from Mg/Ca tend to align with the colder model outputs ( Figure S5).

460
Mg/Ca-SST calibration choice has no consistent impact on the KM5c anomalies (across all latitudes, Figure 4). Therefore, the corrections for secular seawater Mg/Ca change and/or non-thermal influences over Mg/Ca, which are accounted for in BAYMAG (Tierney et al., 2019b) do not account for these cold tropical KM5c anomalies.
For example, for ODP Site 806 in the Western Pacific warm pool, BAYMAG SST estimates for KM5c are ~1°C warmer than the published Mg/Ca record . For Site 999 in the Caribbean Sea, BAYMAG SST 465 estimates for KM5c are ~0.5°C cooler than the published Mg/Ca record . This also suggests the impact of Mg/Caseawater change on SST is small on warm pool sites. The Mg/Caseawater correction used in BAYMAG is conservative, drawing on multiple lines of physical evidence (corals, fluid inclusions, calcite veins, etc) (Tierney et al., 2019b). Given the variable directions of the offsets between published and BAYMAG SSTs shown here, the Mg/Caseawater correction is unable to account for the data-model offsets observed for the low 470 latitudes.
CaCO3 dissolution in the water column and sediments could lead to a cool bias on the Mg/Ca-SSTs Regenberg et al., 2009). However, the cool KM5c anomalies also occur if the forward-modelled core-top Mg/Ca SSTs from BAYMAG are used as the pre-industrial 'reference' (Figure S6).

475
The cold low-latitude anomalies for KM5c could reflect an increase in the calcification depth of the foraminifera, since the surface-dwelling foraminifera analysed here calcify at a range of depths, particularly in the tropics where the thermocline is deep in comparison to mid-to high-latitudes (Fairbanks et al., 1982;Curry et al., 1983). The negative anomalies are broadly smaller for G. ruber (-0.4 to -1.2°C) than for T. sacculifer (-0.6 to -3.5°C), consistent with a deeper depth-habitat for the latter (Curry et al., 1983), although at Site 959 the G. ruber anomaly using BAYMAG is -3.8°C. There is therefore a lack of consistency between sites, which is difficult to resolve when single species have been analysed for each of the sites through KM5c.
Where there are very large differences between BAYMAG and published Mg/Ca SST estimates, regardless of latitude (e.g. North Atlantic, Figure 4), we suggest that some combination of calibration difference, Mg/Caseawater 485 change and/or other environmental factors including seasonality and calcification depth may offer an explanation.
To fully investigate the cause(s) of offsets in Mg/Ca SST reconstructions requires future multi-species analysis for Mg/Ca for each site, and multi-proxy analysis for each site. Such an approach would enable exploration of a wider range of potential influences over both the Mg/Ca and U K 37' SST reconstructions, and a reduction in the uncertainties of the reconstructed SSTs and their anomalies. Alongside foraminifera Mg/Ca and U K 37' analyses, 490 additional proxies which are likely to add valuable information about water column structure and seasonality could include TEX86 (Schouten et al., 2002), long-chain diols (Rampen et al., 2012), and clumped isotopes (Tripati et al., 2010). Previous research has demonstrated that even within a single site there can be offsets between proxies which are not continuous through time (e.g. Petrick et al., 2018), so that high resolution and multi-proxy work is required to fully understand the offsets we have identified here. Resolving the 495 causes of the different proxy-proxy and proxy-model offsets is important, because they impact the calculation of the global mean SST anomaly relative to pre-industrial; however, even with inclusion of the overall cooler Mg/Ca data, the combined KM5c proxy data still indicate a global mean SST anomaly which is larger than most models from the PlioMIP2 experiments ( Figure 3).

5 Conclusions
This study has generated a new multi-proxy synthesis of SST data for an interglacial stage (KM5c) from the Pliocene. By selecting an individual interglacial, with orbital forcing similar to modern, we are able to focus on the SST response to atmospheric CO2 concentrations comparable to today and the near-future (~400 ppmv), but elevated relative to the pre-industrial. Using strict stratigraphic protocols we selected only those data which could 505 be confidently aligned to KM5c. By comparing different calibrations and two different proxy systems (U K 37' and Mg/Ca in planktonic foraminifera) we identified several robust signals which are proxy-independent. First, global mean SSTs during KM5c were warmer than pre-industrial. Second, there was a reduced meridional SST gradient which is the result of relatively small low-latitude SST anomalies and a larger range of warming anomalies for the mid-and high-latitudes. Overall, there is good data-model agreement for both the absolute SSTs and the 510 anomalies relative to the pre-industrial, although there are complexities in the results. Further work is required to generate multi-proxy SST data from single sites, accompanied by robust reconstructions of thermocline temperatures using multi-species foraminifera analysis, so that the range of factors explaining proxy-and calibration-offsets can be explored more fully.

515
The choice of proxy for SST reconstruction impacts the overall calculation of global mean SST and the meridional gradients. The negative anomalies in Mg/Ca-SSTs in six of the sixteen low-latitude sites lowers the global mean SST of KM5c from ~3.2-3.4ºC (U K 37'-only) to ~2.3ºC (combined U K 37' and Mg/Ca). The meridional SST gradient anomalies are decreased to -2.6ºC (combined U K 37' and Mg/Ca) relative to the pre-industrial, although a more muted reduction (up to -1.18°C) occurs with U K 37' alone. A number of factors may lead to a cool bias in the foraminifera Mg/Ca SSTs, which require further investigation through multi-proxy and multi-species analysis, particularly in low-latitude sites.
We identify the strongest warming across the North Atlantic region. The results are consistent with the PlioMIP2 models, although the largely U K 37' data sit at the high end of the calculated model anomalies. Although seasonality 525 may play a role in the proxy data signal, these results also suggest that many models may under-estimate highlatitude warming even with the moderate CO2 increases identified in KM5c relative to the pre-industrial. More data points are required to fully explore these patterns: for seven sites only one data point lay within KM5c, and more than half of the analysed sites (18/32) recorded Atlantic Ocean SSTs.

530
Both the PlioMIP2 models (Haywood et al., in review, 2020) and future projections (IPCC, 2018) indicate that warming is higher over land than in the oceans in response to higher atmospheric CO2 concentrations. Our synthesis of KM5c thus likely represents a minimum warming to be expected with atmospheric CO2 concentrations of ~400 ppmv. Even under low CO2 emission scenarios, our results demonstrate that surface ocean warming may be expected to exceed model projections, and will be accentuated in the higher latitudes.      Table S3 (U K 37') and     At ODP Site 1090, following initial publication of the SST data (Martínez-Garcia et al., 2010) an alternative orbitally-tuned age model was generated using n-alkane concentrations as a proxy for dust 900 inputs, and an anticipated continuation of the Pleistocene relationship of high dust with high 18 O i.e. during glacial stages (Martínez-Garcia et al., 2011). This n-alkane age model aligns KM5c with high n-alkane concentrations and low SSTs, whereas the reverse pattern might be expected ( Figure S1). If the cold interval is re-aligned to KM4, SSTs during KM5c at ODP 1090 are elevated by 0.5°C ( Figure  S1). Given current stratigraphic information for ODP 1090 it is not possible to determine which of 905 these scenarios is correct; thus, we present the SST anomalies according to the original age model, noting that there could be an additional increase in those anomalies of up to 0.5°C depending upon the choice of sample ages.
At ODP Site 806, uncertainty over age control resulted from the absence of an agreed splice across the multiple holes drilled by ODP. High-resolution benthic foraminifera 18 O records were generated 910 on Hole 806B (Bickert et al., 1993;Karas et al., 2009). Here we update the age model using the HMM-Stack Matlab code (Lin et al., 2014), which aligns to the Prob-stack . Additionally, we created a modified meters composite depth (mcd). Using the depth scale generated by Karas et al., (2009) to account for core expansion, we amend Holes 806A and 806C to this depth scale (Matlab code is provided as a supplement). The KM5c interval is muted in Prob-stack in 915 comparison to LR04 . Given the variability in the Site 806 benthic 18 O record (Figure 1), it is difficult to identify the KM5c interval and we rely on the probabilistic alignment of HMM-Match. If we tied the record to LR04 between M2 and KM2 and assumed a linear sedimentation rate, however, the age model in practice would be similar.

Alkenone calibrations
The majority of the alkenone-derived sea-surface temperature (SST) datasets included in the PlioVAR synthesis used the U K 37' index, and applied the core-top calibration (60°S-60°N) by Müller et al. (1998) (hereafter Müller98; Tables S2 and S3). Several PlioVAR datasets were originally published using the laboratory culture calibration of Emiliania huxleyi by   (Table S3); these 925 data were converted to Müller98 so that all sites used the same linear global calibration. The Bayesian U K 37' calibration (BAYSPLINE) was then applied to all sites. Whilst the Müller98 calibration indicates mean annual SSTs for high latitudes, at sites >45°N (Pacific) and >48°N (Atlantic), and in the Mediterranean Sea, BAYSPLINE explicitly reconstructs seasonal SST . 930 Table S3 and Figure S2 compare the reconstructed SST anomalies for KM5c (relative to preindustrial) for the 23 sites which provided alkenone data. In the mid-and high-latitudes, Müller98 tends to generate warmer SSTs compared to BAYSPLINE, with the difference ≤0.9 °C (Table S3). There is relatively little variability in the offset (± 0.15 °C) although that may reflect low sample numbers for some sites ( Figure S2). In the low latitudes, where SSTs exceed ~24.5°C (applying 935 Müller98), the non-linearity of the BAYSPLINE calibration has its biggest impact ( Figure S2). For most low-latitude sites SSTs are ~1°C warmer using BAYSPLINE, but the difference can be as high as 1.67 °C ± 0.01°C (ODP 1143). The warmer low-latitudes in BAYSPLINE reduce the meridional temperature gradient, but both Müller98 and BAYSPLINE are consistent in showing enhanced warming at mid-and high-latitudes. 940

Foraminifera Mg/Ca calibrations
A range of foraminifera species, Mg/Ca-SST calibrations, and corrections for non-thermal impacts on Mg/Ca had been employed for the original published data (Table S4). We present the data as published, recognising the choices made by the original researchers in identifying the best approach 945 for their site. The Bayesian calibration, BAYMAG, was then applied to all data following the settings detailed in the Methods. Table S4 and Figure S3 compared the reconstructed SST anomalies for KM5c (relative to preindustrial) for the 12 sites which provided foraminifera Mg/Ca data. A wide range of offsets is recorded, both positive and negative, and there is no clear pattern in terms of latitude or species. Table S1. Sites used in the PlioVAR synthesis, their age constraints and SST proxies, can be accessed at https://pliovar.github.io/km5c.html.