The Warm Winter Paradox in the Pliocene High Latitudes

Reconciling palaeodata with model simulations of the Plioc ene climate is essential for understanding a world with atmospheric CO2 concentration near 400 parts per million by volume. Both mod els and data indicate an amplified warming of the high latitudes during the Pliocene, however terrestria l data suggests Pliocene high latitude temperatures were mu ch higher than can be simulated by models. 5 Here we show that understanding the Pliocene high latitude t err strial temperatures is particularly difficult for the c oldest months, where the temperatures obtained from models and dif ferent proxies can vary by more than 20 ◦C. We refer to this mismatch as the ‘warm winter paradox’. Analysis suggests the warm winter paradox could be due to a nu mber of factors including: model structural uncertainty, 10 proxy data not being strongly constrained by winter tempera tures, uncertainties on data reconstruction methods and al so that the Pliocene high latitude climate does not have a modern ana logue. Refinements to model boundary conditions or proxy dat ing are unlikely to contribute significantly to the resolution o f the warm winter paradox. For the Pliocene, high latitude, terrestrial, summer tempe ratures, models and different proxies are in good agreement . Those 15 factors which cause uncertainty on winter temperatures are hown to be much less important for the summer. Until some of t he uncertainties on winter, high latitude, Pliocene temperat ures can be reduced, we suggest a data-model comparison shou ld focus on the summer. This is expected to give more meaningful and ac cur te results than a data-model comparison which focuses o n the annual mean.

For the second phase of PlioMIP (PlioMIP2) substantial effort has been made to improve DMC by reducing potential sources of uncertainty attributed to a) Model boundary conditions, b) Model structure and c) data. Model uncertainties were reduced by a) utilising an improved set of model boundary conditions (PRISM4; Dowsett et al., 2016), and b) increasing the size and complexity of the PlioMIP2 ensemble relative to PlioMIP1. Although there are many sources of data uncertainty, Haywood et al. (2013) highlighted temporal uncertainty as a particular issue. PlioMIP1 focussed on a > 200,000 year timeslab also show that uncertainties on summer temperatures are very different from those on winter temperatures and that a summer DMC is likely to lead to more accurate results.
The layout of the paper is as follows. Section 2 will describe the modelling and methods used. Section 3 will present a DMC focusing on both the annual mean and also on seasonal temperatures. Section 4 will discuss possible reasons for the 'warm 100 winter paradox'. A discussion of the results and conclusions will be presented in section 5.

Climate Modelling
This paper makes use of two sets of modelling simulations to represent the mPWP. The first set is the model results from 105 PlioMIP2, the second is a set of simulations run with the HadCM3 climate model to assess uncertainties caused by orbital forcing. These are described below.

PlioMIP2 core experiments
The PlioMIP2 ensemble , is the largest consistent set of mPWP model simulations to date. All mod-110 elling groups participating in PlioMIP2 were required to run a preindustrial experiment and a core mPWP experiment, which was intended to represent the KM5c timeslice (3.205 Ma). Boundary conditions for the core mPWP experiment included CO 2 of 400ppmv (which is within the range obtained by de la Vega et al., 2020) and a modern orbit. The land-sea mask, topography, bathymetry, vegetation, soils, lakes and land ice cover were obtained from the latest iteration of PRISM (PRISM4; Dowsett et al., 2016). It must be noted that the boundary conditions were not implemented identically in all of the PlioMIP2 115 models although there is substantial commonality. See papers referenced in table 1 for details of how each model implemented the boundary conditions.

HadCM3 Orbital sensitivity experiments
The KM5c timeslice had an orbit very close to modern (Haywood et al., 2013), hence all PlioMIP2 experiments were run with 120 a modern orbit. Over land, it is difficult to obtain data with orbital temporal precision, and in order that a DMC is even possible it is necessary to utilise data from outside the KM5c timeslice and sometimes even outside the mPWP. It is reasonable to utilise such data, provided that one is aware that this will lead to errors in the DMC. Close to the KM5c timeslice these errors are mainly due to orbital configuration, hence we include orbital uncertainties on the modelled climate when comparing with terrestrial palaeodata. Data that is further away from KM5c, such as from the early Pliocene, can also be compared with the 125 PlioMIP2 models, provided that one is aware of the low confidence in the results due to errors in other modelled boundary conditions (e.g. CO 2 , ice sheets) which are difficult to quantify.
The PlioMIP2 experimental design did not include orbital sensitivity experiments. We therefore assess orbital uncertainty by including a number of sensitivity experiments run with a single model, HadCM3 (Gordon et al., 2000). Table 2 shows the top of 130 the atmosphere (TOA) insolation for specified times within the period 2.9Ma to 3.3Ma. The first block shows the most extreme TOA insolation for January and July at 65 • N, and the second block shows the most extreme TOA insolation for January and July at 56 • N. The third block shows the HadCM3 modelling sensitivity experiments that we used in this paper along with their timeslice and TOA insolation. It is seen that the orbits we use here cover relatively extreme orbits for the latitudes of interest.
The orbits representing G17 (2.950Ma), K1 (3.060Ma) and KM3 (3.155Ma) have already been discussed by Prescott et al. 135 (2014). They all show high July TOA insolation, and K1 also shows high January TOA insolation at 65 • N. Here, we use an additional orbit, 3.037Ma, which maximizes January TOA insolation at 56 • N, and this orbit shows a smaller TOA insolation in July than the others used. We choose orbits which are designed to produce high TOA insolation, as these will produce warmer temperatures and would be expected to reduce the model-data disagreement seen in figure 1.

Vegetation modelling
We simulate mPWP vegetation, by using the PlioMIP2 climate to drive the BIOME4 mechanistic global vegetation model (Kaplan, 2001). BIOME4 has been used in many previous studies of the mPWP (e.g. Salzmann et al., 2008;Pound et al., 2014;Prescott et al., 2018), and it predicts the distribution of 28 global biomes based on the monthly means of temperature, precipitation, cloudiness and absolute minimum temperature.

145
There are two ways to run the BIOME4 model. These are a) absolute mode or b) anomaly mode. For the absolute mode, BIOME4 is driven by direct climate model outputs for the period of interest. The anomaly mode accounts for known climate model biases that occur in the model's modern simulation and are likely to propagate through to other time periods. In anomaly mode climate inputs to BIOME4 are obtained by calculating the modelled climate anomaly from the preindustrial and adding 150 this onto modern observations as follows: where x is one of the BIOME4 inputs (temperature, precipitation, cloudiness, absolute minimum temperature), input denotes a parameter input to BIOME4, model denotes a simulated value from the multimodel mean and obs is a modern dataset, which was based on observations and created for BIOME4, as described by Kaplan et al. (2003). In this paper we will use the anomaly mode, because this gives a more detailed representation of possible biomes, particularly at small spatial scales. However in the supplementary information we will also show results from the absolute mode to highlight that, for the mPWP, large scale features of biomes are not dependent on the methodology used.

mPWP Mean Annual Temperature
There are 8 palaeovegetation data sites that are compared with PlioMIP2 model results in figure 1b. Figure 2 shows the DMC for each of these sites with values reported in table S1.

165
In figure 2 the blue symbols show the difference between the multimodel mean (MMM) PI mean annual temperature (MAT) and the modern observed MAT at the datasite. The blue dotted line shows the anomaly between the modern observations and the CRU reanalysis data and is intended to represent the error bars due to comparing modern MAT at a site to a gridbox sized area for the preindustrial. It is seen that the PI MMM MAT is in good agreement with the data suggesting that there is no inherent model bias at these locations.

170
Red symbols on figure 2 show the difference between the mPWP PlioMIP2 simulations and the MAT obtained from the palaevegetation-based climate reconstruction. The red circle is the MMM and the small crosses show results from the 17 individual models. Error bars on the reconstructed temperatures due to the combined bioclimatic and temporal variability are shown by the red dotted lines (where available).
Figure 2 shows very good model-data agreement for the mPWP at the 5 sites between 47 • N and 30 • S. The higher latitude sites (at 64 • N, 56 • N and 53 • N) do not show good model-data agreement. Instead the mPWP temperature suggested by the data is substantially higher than the MMM. Although no definitive conclusions can be drawn from such a small number of datapoints, it appears that the models have more difficulty in reproducing the data at higher latitudes than at lower latitudes.

180
This was also shown by Salzmann et al. (2013).

Seasonal Temperatures
We now consider whether the mPWP model-data disagreement seen at high latitudes is uniform throughout the year or whether it occurs preferentially in certain seasons. Figure 3 and  additional Pliocene data because only two sites can be dated close to KM5c.
For the KM5c DMC, the PlioMIP2 MMM agrees very well with the warm month temperature at Lake El'gygytgyn (data: 190 15-16 • C, MMM: 16.2 • C), although the warm month temperature MMM is ∼ 6 • C warmer than the data at Lake Baikal (data: 15.2-17.5 • C, MMM: 22.8 • C). For the cold month temperature, there is a larger discrepancy between the MMM and the data.
The MMM cold month temperature is ∼ 6 • C warmer than that obtained from data at Lake El'gygytgyn and ∼ 23 • C colder than that obtained from data at Lake Baikal. At Lake Baikal even the warmest model (CESM2) simulated the cold month temperature ∼ 15 • C too cold. The data suggests that the KM5c cold month temperature at Lake El'gygytgyn was > 30 • C cooler than at 195 Lake Baikal, however none of the models show this: all models suggest that the two sites differ in temperature by less than 6 • C.
The second block in figure 3 shows how the PlioMIP2 models compare with other Late Pliocene data. This DMC has the caveat that the modelled data represents a different temporal slice to what has been reconstructed. Because of this temporal mismatch we would expect some model-data disagreement, however we would highlight large model-data discrepancies as 200 problematic. For example, at Lake Baikal we have two reconstructed temperatures: one near KM5c and the other dated as 'prior to 3.5Ma' (Demske et al., 2002). Although the reconstructed temperatures at these two dates differ, this difference is relatively small compared to the large model-data discord that occurs at this site. This suggests that accounting for dating uncertainties would not be sufficient to explain the very large model-data mismatches on the cold month temperature for the Late Pliocene Lake Baikal site.

205
Further late Pliocene climatic data from Russia was obtained by Popova et al. (2012) and is compared with model results in figure 3 (sites: Mirny, Merkutlinskiy, Kabinet, Delyankir, Chernoluche, Blizkiy and 42km). These sites show similar reconstructed temperatures to those at Lake Baikal, and corroborate a strong model-data discord for the coldest month. Additional data for the Early Pliocene (sites Tnekveem and Hydzhak) also show the same pattern, however we do note that confidence in 210 the DMC comparison is lower for the Early Pliocene sites.
North American sites are also included in figure 3 at Lost Chicken Mine and the Canadian Arctic sites of Meighen Island, Beaver Pond and Fyles Leaf Beds. The Canadian Arctic sites have temperatures reconstructed using two different methods: coexistence likelihood estimation (CRACLE; Harbert and Nixon, 2015) and an open-data method based on the coexistence 215 approach Mosbrugger and Utescher, 1997). Regardless of the exact dating, location or reconstruction method, the DMC over North America follows the same pattern as that seen over North Asia: the models agree reasonably well with the temperature for the warm month, but modelled temperatures are too cold for the coldest month. Differences in location, proxy age or reconstruction method can affect the temperature but are not large enough to affect the general conclusion of model-data discord.
Other proxies which provide summer temperature at the Beaver Pond site agree with warm season temperatures derived from the palaeovegetation and are close to the warm month temperature from the models. These are: (2010) derived annual mean temperatures using oxygen isotopes and annual tree ring width (-0.5 +/-1.9 • C) and bacterial tetraether composition in paleosols (-0.6 +/-5.0 • C). These temperatures are much warmer than suggested by the models (median temperature = -11.4 • C).

235
In addition to the above proxies, the literature contains reconstructions of both warm month and cold month Pliocene temperatures from beetle assemblages over the high latitudes of North America. However, the cold season temperatures are less well constrained than warm month temperatures (e.g. Elias et al., 1996;Elias and Matthews, 2002;Huppert and Solow, 2004).
This may mean that the cold season temperature derived from beetles might reflect the modern seasonal range of temperature in the calibration dataset rather than the Pliocene cold season (Fletcher et al., 2019a).
240 Figure 4 compares the PlioMIP2 models to the North American beetle assemblage data, and shows good model data agreement for the warm month temperature. Unlike the DMC for the palaeovegetation proxies, the MMM agrees reasonably well with the cold month temperature reconstructed from beetle data, particularly that derived using the Mutual Climate Range method. However, this model-data agreement may be due to large error bars on both models and beetle data and may be due 245 to neither of them being able to produce large enough anomalies from the modern climate (Fletcher et al., 2019a). It is unclear what causes this large disagreement on winter temperatures obtained from various sources, and we will refer to this contradiction as 'the warm winter paradox'. Resolving this 'warm winter paradox' is essential if we are to bring models into line with the data and understand the true nature of the Pliocene cold season climate.  Figure 5a shows the mPWP high northern latitude biome reconstruction (Salzmann et al., 2008). This can be compared with ticularly over North America), the reconstructed biome can be modelled at a nearby location (see figure 5b), suggesting that some discrepancies are due to small spatial errors. Over Western Europe, the warm mixed forest in the model extends too far to the east and the MMM does not reproduce the extent of the cool mixed forests seen in the data. However it is quite easy to simulate cool mixed forest in this region with only minor parameter changes to the BIOME4 model (not shown), suggesting model and data are 'close' in this region.

260
A notable region of data-model mismatch is in central Eurasia. Here, the reconstructed biome is 'temperate conifer forest' and the model simulates 'evergreen taiga'. BIOME4 can only simulate 'temperate conifer forest' when the cold month temperature is above -2 • C, a condition that is not provided by any of the PlioMIP2 models. The biome data-model mismatch in this region is not easily resolved and is due to the warm winter paradox (i.e. data suggesting warmer winters than can be modelled).

265
However, in North America and Western Europe the warm winter paradox does not prevent BIOME4 from simulating the correct vegetation biome.
4 What causes the warm winter paradox?

Could proxy dating uncertainties help resolve the warm winter paradox?
270 Haywood et al. (2013) suggested that the mismatch between models and data for PlioMIP1 might be caused by a comparison between model results representing a short timeslice and data that represented the ∼300,000 years of the mPWP. Moving to the KM5c timeslice for both models and data in PlioMIP2 has addressed this methodological error and there is an improvement in model data agreement for ocean proxies .

275
A terrestrial DMC for the KM5c timeslice is problematic because there is very little terrestrial data with suitable temporal precision, and it was necessary to incorporate some data from outside the timeslice. It is therefore important to check whether the warm winter paradox could be reduced (or even eliminated) by accounting for temporal model-data mismatches.
Proxy dating uncertainties have previously been explored with modelling sensitivity experiments using different orbital con-280 figurations. These were found to show better data-model agreement in the annual mean temperature at high latitudes (e.g. Feng et al., 2017;Hill, 2015;Howell et al., 2016). It is relatively easy to increase the annual mean temperature at high latitudes by changing the orbital configuration (e.g Prescott et al., 2014), and tempting to use this as a partial solution as to why models and data do not agree. However, since the model-data mismatch occurs in the winter season, any orbital solution must increase the cold month temperature and have a smaller effect on the warm month temperature.

285
Here we use the HadCM3 model to assess how different orbital configurations in the mPWP would change the warm month and cold month temperatures. The orbital configurations we include are shown in table 2. Although the list is not exhaustive, it includes enough of the extreme orbital configurations to allow an assessment of whether orbit is likely to prove important for resolving the warm winter paradox.
290 Figure 6a shows the difference between the model and data for the cold month temperature, for sites dated as 'KM5c' or 'Late Pliocene'. The HadCM3 simulation representing the KM5c timeslice is shown by the orange square, while the triangles show the HadCM3 simulations for other timeslices considered. For context, the red circles show the KM5c simulation for other PlioMIP2 models. As expected, the simulation, which had the largest January insolation (3.053Ma) produced the warmest cold 295 month temperatures. However, the cold month temperature is more sensitive to which model is used than the exact orbital configuration. This suggests that structural model uncertainties are a more likely contributor to the warm winter paradox than uncertainties on the exact timeslice that is to be compared with the data. Figure 6b shows that the orbital configuration chosen can strongly affect the warm month temperature. This is unsurprising 300 because the summer insolation is much more variable than the winter insolation (table 2). If we had a 'warm summer paradox', then dating errors could be an important part of the solution. Figure 6 highlights the major shortcoming of using 'warm' orbital configurations to improve model-data agreement for the annual mean temperature. In the annual mean both the K1 and KM3 simulations predict higher temperatures than KM5c, and show the best agreement with the annual mean temperature reconstructions. However neither K1 or KM3 produce a good representation of cold month or warm month temperatures. In 305 addition, neither of these simulations are able to simulate realistic Pliocene biomes (Prescott et al., 2018). This highlights that a DMC on annual mean temperatures is insufficient for determining the extent of model data agreement.
This subsection asked: "Could proxy dating uncertainties help resolve the warm winter paradox?" If we assume that dating uncertainties can be quantified by assessing the most extreme orbital configurations in the mPWP, then the answer is that proxy 310 dating uncertainties are unimportant for the winter season. However, the orbital configuration is not the only model boundary condition that would change as we progress through all the timeslices that make up the Pliocene. Other boundary conditions would include changes in trace gas, ice sheet extent, vegetation distribution, ocean gateways and associated feedbacks. Sensitivity tests using different values of CO 2 (not shown) suggest that changing CO 2 would not lead to preferential warming in a particular season. It remains to be explored whether changing other modelled boundary conditions (e.g. ice sheets) could have 315 a preferential effect on warming the winter season. However, the PlioMIP2 simulations only include a small ice sheet over Greenland, hence there is limited scope for reducing ice sheets further in the Northern Hemisphere.

Could local climate effects help explain the warm winter paradox? (A case study of Lake Baikal)
Figure 3 shows very different results for the two sites dated near KM5c, with the PlioMIP2 models better simulating the tem-320 perature at Lake El'gygytgyn than at Lake Baikal. Here we consider the DMC at Lake Baikal in more detail to assess why this It is known that large bodies of water retain heat longer than the land; hence the climate around Lake Baikal is much milder than the rest of southern Siberia. However, most models do not accurately simulate the climate stabilising effects of the lake 325 and their prediction of climate at this location is more representative of the wider region than the local site.
Meteorological observations for three sites near Lake Baikal are shown in table 4. Nizhneangarsk is on the northern edge of Lake Baikal while Zhigalovo is 4 • to the west and Kalakan is 7 • to the east. Even though Nizhneangarsk lies between the other two sites, the large heat capacity of Lake Baikal means that it has warmer annual mean temperature, warmer January 330 temperature and colder July temperature. To quantitatively estimate how much the lake will stabilise the temperature we compare observations at Nizhneangarsk with the temperature interpolated onto the Nizhneangarsk location from observations at Zhigalovo and Kalakan (see table 4). Comparing this interpolated temperature with that recorded suggests that the presence of the lake increases the annual mean temperature by 1.8 • C, it increases the January temperature by 7.8 • C and cools the July temperature by 2 • C. Assuming that Lake Baikal affected the mPWP climate in an analogous way, the model results can be 335 corrected by this amount. This correction reduces the mPWP annual mean data-model discrepancy at this site from 8.5 • C to 6.9 • C, the warm month temperature data-model discrepancy from 6 • C to 4 • C and the cold month data-model discrepancy from 23 • C to 15 • C. This correction is not sufficient to allow model-data agreement for the Pliocene winter. However it does improve model-data agreement and will be one of a number of factors that need consideration on the Pliocene DMC.

340
A small caveat to this approach is that some models already account for the climate stabilising effects of the lake. For example, CESM2 and GISS2.1G contain a lake component and both include realistic representation of lakes in their preindustrial and their mPWP simulation. These models do not need correcting to account for the climate stabilising effects of the lake, and applying such a correction would reduce agreement with observations for the modern. Ultimately it is a choice for individual modelling groups as to whether their model output needs correcting to account for microclimate effects at a specific location.

345
In our study we suggest that the MMM temperatures at Lake Baikal requires a 'lake' correction because it is required for the majority of the PlioMIP2 models.

Can other uncertainties on reconstructed temperatures help explain the warm winter paradox?
4.3.1 Vegetation proxies may not be strongly related to the cold month temperature 350 The cold month temperatures from the PlioMIP2 models are lower than reconstructed from data. Over north Asia this leads to a mismatch between reconstructed biomes and biomes simulated by BIOME4. However, both the cold month temperature reconstructions and the BIOME4 model assume that the cold month temperature is a strong constraint on the distribution of tree species, and that the limits on the range of trees can be determined using correlations from the modern climate. In fact extremes in winter (when the species were dormant) have very little relationship to range limits and that tree species could tolerate much cooler winter temperatures than those that are currently experienced. Spring temperature was found to be far more important for determining whether a species can survive and reproduce, and growing season temperature is also important.
This suggests that the uncertainties on winter temperatures may be much larger than is sometimes reported.

Possible errors on reconstruction methods
Palaeoclimate reconstruction methods can be used to reconstruct modern climate. These modern reconstructions can be compared to modern observations to provide error bars on the method. Following this approach, Harbert and Nixon (2015) found the average error on the MAT reconstructed using the CRACLE method 1 was 1.3 -1.4 • C, which compared well with errors of 1.8 • C for CLAMP and 2 • C for leaf margin analysis. None of these errors are large enough to notably contribute to the data-365 model mismatch found for the Pliocene. However these errors are global averages, and do not appear uniformly over the globe.
(Harbert and Baryiames, 2020, ; their figure 2) shows that the error in reconstructing the minimum temperature of the coldest month appears larger at cold temperatures than the average error over the globe. For example, sites with minimum temperature below -20 • C appear to have a clear warm bias, which also occurs on the mean annual temperature. No clear bias is apparent when reconstructing the maximum temperature of the warmest month. If this warm bias on the minimum temperature is robust, 370 and also occurs in the Pliocene temperature reconstruction, it could contribute to the model-data discord seen in figures 2 and 3.

Different proxies suggest different cold month temperatures
We note from figures 3 and 4 that there are differences in the temperature reconstructions from different proxies. It is beyond the scope of this paper to compare and contrast proxy methods and this subject is covered in other papers (e.g. Harbert and Nixon,375 2015; Fletcher et al., 2019a). However two things are of note. Firstly, the only two sites that are close to KM5c on figure 3, Lake Baikal and Lake El'gygytgyn, have similar warm month temperature reconstruction, but suggest cold month temperatures that differ by over 30 • C, a feature that does not occur in any of the models or in modern observational data. Could some of this difference between the two sites be related to the different methodologies used for temperature reconstruction (table 3)?
Secondly, differences between proxy reconstructed temperature for the Pliocene are often larger than published error bars (or 380 may result from some published ranges not including error bars). For example, the cold month mean temperature from the coexistence likelihood estimation provides a warmer temperature than the coexistence approach, which provides a warmer temperature than the Mutual Climatic Range method for beetle assemblages (figures 3 and 4). For the warm month mean temperature all approaches yield similar temperatures. Note that here we are not suggesting that one reconstruction of the cold month mean temperature is better than another, instead we are pointing out that the cold month temperature from proxy data 385 appears to be subject to greater uncertainty than the warm month temperatures.

Could modelling errors be responsible for the warm winter paradox?
Models are, by their nature, an imperfect representation of reality and all models have errors, even for the preindustrial where boundary conditions are well known and where some model parameters have been chosen based on their ability to produce 390 a realistic climate. Pliocene simulations use the same model parameters that have been optimised for the modern climate and also have less well constrained boundary conditions. Hence, the simulated Pliocene climate contains more uncertainties than the corresponding preindustrial simulations. Figure 3 shows that across the PlioMIP2 ensemble there is large variation in the simulated CMMT temperatures (up to ∼ 395 20 • C). This large range is from a suite of models that have been run with very similar boundary conditions (orbit, CO 2 , ice sheets), so the model spread is likely due to inherent model structure. The PlioMIP2 models have equilibrium climate sensitivities 2 , between 2.3 • C and 5.2 • C which covers the range suggested by IPCC, hence the ensembles response to CO 2 forcing is likely reasonable. However, the modelled response to the full Pliocene boundary condition changes (e.g. ice sheets and orography) is less constrained by other sources. There may also some important forcings (e.g. Methane; Hopcroft et al., 2020) that 400 have not been included in the PlioMIP2 simulations, and some important feedbacks, for example fire (Fletcher et al., 2019b) and chemistry (Unger and Yue, 2014), that are not included.
Clouds and convection feedbacks are subject to uncertainties and could lead to errors in Pliocene simulations. Abbot and Tziperman (2008) used a single column model to show that deep atmospheric convection might occur during winter in ice-free high lati-405 tude oceans, and could increase high latitude winter temperatures by ∼50 • C. However, this feedback did not occur in any of the PlioMIP2 models despite January Arctic sea ice extent being reduced by up to 76%.
Another potential source of model error might be that the PlioMIP2 models are not high enough resolution to fully resolve the processes occuring in the Pliocene. For example, Arnold et al. (2014) showed that modelling a high CO 2 world with a 410 cloud-permitting model led to greater Arctic cloud cover and sea ice loss than if convection were parameterized. However, these changes had relatively minor effects on Arctic temperatures. Pope et al. (2011) considered the uncertainty that could result from incorrect tuning of the model parameters in the HadCM3 model by running a Pliocene perturbed physics ensemble which varied uncertain model parameters (within reasonable bounds).

415
They showed that the effect of using model parameters designed to produce a high sensitivity climate could be substantial (approximately 2-5 • C of warming in the Pliocene over the high latitude continents). However they presented their results for the annual mean temperature only, so we currently do not know how this increase would manifest seasonally. They also noted that 2 Equilibrium Climate Sensitivity is defined as the global temperature response to a doubling of CO 2 once the energy balance has reached equilibrium if this 'high sensitivity' climate was used to drive BIOME4 then the biome distribution did not agree with reconstructions as well as the biome distribution simulated from the control climate.

420
It is likely that as models develop there will be future refinements to the Pliocene model simulations, and this could provide a part of the solution to the warm winter paradox. However, the current set of PlioMIP2 experiments provides good agreement with ocean data (McClymont et al., 2020;Haywood et al., 2020), so the potential for model refinements are subject to ocean data constraints and may not change as much as is needed to fully agree with the cold month terrestrial temperature data.  Figure 7 shows that in the Pliocene the high latitude forests were further poleward than they were in the preindustrial climate.

Could a geographical shift in biome boundaries explain the warm winter paradox?
This is logical, because in a warmer climate vegetation would be able to inhabit regions that are too cold today. However, this does not mean that the climate experienced by a biome in the Pliocene will be exactly the same as the climate experienced by 430 that biome today. We can be confident that if a plant species occupied a higher latitude niche at KM5c than it does today then the amount of incoming solar insolation it experienced would vary more throughout the year. As an example, Fletcher et al. (2019b) showed at a lower latitude. However, it is likely that this would not be the case for every location over the globe, and some mid-high latitude ecosystems in the mid-Pliocene could experience environmental conditions outside the modern sample. This would lead to uncertainties on climate reconstruction methods that utilise information from the modern distribution of plants to determine past climates. Any such uncertainties would increase error bars on winter temperatures because plant distributions are more strongly constrained by spring and summer temperatures (Korner et al., 2016). Furthermore the error bars would likely be 450 skewed towards colder temperatures because winter insolation becomes strongly reduced as we move to higher latitudes. We therefore highlight the geographical shift in biome boundaries and the potential for a non analogue climate as another possible contributor to the warm winter paradox.

455
The latest iteration of the Pliocene Modelling Intercomparison Project (PlioMIP2) produces temperatures that agree very well with proxy data over oceans the tropical land surface and the high latitude land surface WMMT. The high latitude CMMT, however, shows large model-data disagreement. The proxy data suggests very high temperatures that the models are unable to replicate. We term this the 'warm winter paradox'.

460
This cold month, high latitude, terrestrial data-model discord is not unique to the Pliocene. For the Holocene, Mauri et al.
(2015) noted that their "climate reconstruction suggests warming in Europe during the mid-Holocene was greater in winter than in summer, an apparent paradox that is not consistent with current climate model simulations and traditional interpretations of Milankovitch theory". For the LGM, Kageyama et al. (2021) showed that none of the models analysed could simulate the amplitude of the reconstructed winter cooling over Western Europe. For older, greenhouse climates in the Mesozoic and 465 early Cenozoic there has been a longstanding 'equable climate problem' (e.g. Greenwood and Wing, 1995;Huber and Sloan, 2001), where models typically predict temperatures 20 • C colder than data over the continental interiors. Huber and Caballero (2011) showed that modelling the Eocene with very large CO 2 values (16 × preindustrial) was able to simulate cold month temperatures in reasonable agreement with the data. However studies of the Eocene climate generally use much smaller CO 2 forcing (1 -9 × preindustrial; Lunt et al., 2021).

470
For the Pliocene we have investigated several possible contributors to the 'warm winter paradox'. It is likely that the 'warm winter paradox' cannot be solved by one single factor, and instead that it is due to a multitude of factors. The relevant factors we have considered, do lead to a potential warm bias on the data and a potential cold bias on the models, suggesting they could increase model-data agreement.

475
For the warm winter paradox, we find that structural model uncertainties are likely to be more important than uncertainties in the model boundary conditions. This is because the data-model discord does not seem to be largely dependent on the exact age of the proxy data, or simulated orbital boundary conditions yet the range of temperatures simulated by different models is relatively large. All models also share some aspects of structural uncertainty that could affect the simulated climate. For 480 example none are able to fully resolve convection or other high resolution processes.
From a data perspective we have noted that different data sources provide very different reconstructions of winter temperatures. Although, there are good reason to suggest that some reconstructions are better than others, the very different reconstructed temperatures lend some uncertainty to this winter temperatures. Additional uncertainty arises because, the proxies we have considered, (vegetation and beetle assemblage) may not be particularly sensitive to cold month temperatures.
The methodology of obtaining temperatures may contribute errors to the DMC. For example, a modern day test case showed that the CRACLE method had a warm bias on CMMT of 4.4 • C for very cold winter temperatures, this was more than 3 times that global average error. In the models, very local effects that the models do not resolve could bias results, as was evidenced 490 by a modelled CMMT cold bias of 7.8 • C at the Lake Baikal site. Removing these two potential methodological errors would bring the model and data 12.2 • C closer together at Lake Baikal.
Finally we considered the non-analogue nature of the Pliocene climate and how this might influence the DMC. If this were an issue it would affect temperatures reconstructed from data, because temperature reconstruction methods rely on modern 495 habitats of flora and fauna to determine range limits, which can then be used to determine Pliocene climate. However, there are likely to be Pliocene climates that are outside the modern range. At such places, the reconstructed temperatures will be subject to greater uncertainty. We have argued in this paper that the increase in uncertainty is likely to take the form of a warm (rather than a cold) bias and could provide a nudge towards greater model-data agreement.

500
Relative to the cold month temperature, there appear to be fewer uncertainties on the warm month temperature. Previous studies (e.g. Abbot and Tziperman, 2008) do not note as large a sensitivity on the warm month temperature to the changing climate. Proxies are more sensitive to the warm month temperature and can therefore be used to produce a more accurate reconstruction. In contrast to the cold month temperature, boundary conditions do appear important for simulating the warm month temperature, suggesting that modelling sensitivity experiments could be used to fine tune warm month temperature and 505 produce good model-data agreement. However, this potential to easily bring model and data into line for the warm month temperature is not needed. The PlioMIP2 models agree well with the warm month temperature from the data, and data from different sources concur.
The high latitude mPWP CMMT obtained from models and from data are so different that they cannot both be correct.

510
Indeed, given the large uncertainties on both model and data, it is plausible that the mean value obtained from both methods are wrong, although it is not yet possible to state how large the errors on either model and data are likely to be. Until this uncertainty is reduced it might be advisable to discuss mPWP high latitude climate in terms of more consistent parameters such as WMMT or vegetation biomes. This is not to say that winter temperatures should be ignored. However we want to avoid suggestions that one may take from such comparisons: for example that models cannot accurately simulate polar amplification.

515
A more accurate conclusion would be that, for the Pliocene, models are very good at simulating polar amplification for the summer months, and the uncertainties from both models and data on winter temperatures are currently too large to be able to provide reliable conclusions.
KM5c data at Lake Baikal. JCT prepared the paper with contributions from all coauthors.
Competing interests. The authors declare that they have no conflict of interest