Reduced El Niño variability in the mid-Pliocene according to the PlioMIP2 ensemble

The mid-Pliocene warm period (3.264–3.025 Ma) is the most recent geological period during which atmospheric CO2 levels were similar to recent historical values (∼ 400 ppm). Several proxy reconstructions for the midPliocene show highly reduced zonal sea surface temperature (SST) gradients in the tropical Pacific Ocean, indicating an El Niño-like mean state. However, past modelling studies do not show these highly reduced gradients. Efforts to understand mid-Pliocene climate dynamics have led to the Pliocene Model Intercomparison Project (PlioMIP). Results from the first phase (PlioMIP1) showed clear El Niño variability (albeit significantly reduced) and did not show the greatly reduced time-mean zonal SST gradient suggested by some of the proxies. In this work, we study El Niño–Southern Oscillation (ENSO) variability in the PlioMIP2 ensemble, which consists of additional global coupled climate models and updated boundary conditions compared to PlioMIP1. We quantify ENSO amplitude, period, spatial structure and “flavour”, as well as the tropical Pacific annual mean state in mid-Pliocene and pre-industrial simulations. Results show a reduced ENSO amplitude in the model-ensemble mean (−24 %) with respect to the pre-industrial, with 15 out of 17 individual models showing such a reduction. Furthermore, the spectral power of this variability considerably decreases in the 3–4year band. The spatial structure of the dominant empirical orthogonal function shows no particular change in the patterns of tropical Pacific variability in the model-ensemble mean, compared to the pre-industrial. Although the timemean zonal SST gradient in the equatorial Pacific decreases for 14 out of 17 models (0.2 C reduction in the ensemble mean), there does not seem to be a correlation with the decrease in ENSO amplitude. The models showing the most “El Niño-like” mean state changes show a similar ENSO amplitude to that in the pre-industrial reference, while models showing more “La Niña-like” mean state changes generally show a large reduction in ENSO variability. The PlioMIP2 results show a reasonable agreement with both time-mean proxies indicating a reduced zonal SST gradient and reconstructions indicating a reduced, or similar, ENSO variability.


Introduction
The mid-Piacenzian or mid-Pliocene warm period (mPWP, 3.264-3.025 Ma) was a recent geological interval of sustained warmth with global mean temperatures 2-5 • C higher than the pre-industrial Dowsett et al., 2010Dowsett et al., , 2016Haywood et al., 2020). Atmospheric CO 2 levels were ∼ 400 ppm (Badger et al., 2013;Fedorov et al., 2013;Haywood et al., 2016a;de la Vega et al., 2020), similar to values of the early 21st century. This makes this period an interesting case study for our near-future climate, also because the mid-Pliocene had a similar geography to the present (outside of ice-sheet regions). Efforts to under-stand the mPWP climate have been ongoing for more than 25 years and led to the coordination of the Pliocene Modelling Intercomparison Project (PlioMIP) phase 1 in 2010 . The PlioMIP1 ensemble shows a range of global mean surface temperature anomalies, even though the models have nearly identical boundary conditions. Furthermore, comparison with proxies highlights that most models underestimate polar amplification . The PlioMIP phase 2 was initiated to further understand the mPWP climate and more specifically designed to reduce uncertainties in model boundary conditions and in proxy data reconstruction (Haywood et al., 2016a. It employs boundary conditions from the Pliocene Research, Interpretation and Synoptic Mapping (PRISM) version 4, including updated reconstructions of ocean bathymetry and land-ice surface topography, as well as Pliocene soils and lakes (Dowsett et al., 2016;Haywood et al., 2016a). The PlioMIP2 simulations are specifically tuned to the KM5c interglacial (3.205 Ma), a time slice within the mPWP with orbital parameters close to the present-day configuration. Important changes in boundary conditions compared to the experimental design of PlioMIP1 include the closure of the Canadian Archipelago and the Bering Strait, and the shoaling of the Sahul and Sunda shelves. The PlioMIP2 global average, annual mean surface air temperature (SAT) increase is 1.7-5.2 • C (3.3 • C in the ensemble mean) compared to the pre-industrial, when implementing PRISM4 boundary conditions in PlioMIP2 Williams et al., 2021).
One of the more perplexing and still unanswered topics in the Pliocene research community is the behaviour of tropical Pacific variability in the mid-Pliocene, in particular of the El Niño-Southern Oscillation (ENSO). In the present-day climate, ENSO is the most prominent mode of variability on interannual timescales. It has its origin in the tropical Pacific, while having teleconnections to many regions in the world (Philander, 1990). The ENSO phenomenon can be explained as an internally generated mode of variability of the coupled equatorial ocean-atmosphere system -either self-sustained or excited by random noise (Fedorov et al., 2003). The background climate such as meridional and zonal sea surface temperature (SST) gradients, vertical temperature gradients, and the trade wind strength is thought to play an important role in the properties of this internal mode of variability. A recent review by Cai et al. (2021) shows that ENSO-related SST variability has increased in the past decades and is projected to increase further under future greenhouse warming. However, many of these findings are strongly influenced by internal variability and interactions with the mean state, and there is by no means a consensus on many aspects of past as well as future ENSO behaviour (IPCC, 2021). It is therefore an interesting issue to study ENSO variability in the warm conditions of the mid-Pliocene.
Early proxy reconstructions indicated that the mid-Pliocene tropical Pacific may have highly reduced zonal SST gradients (Molnar and Cane, 2002;Wara et al., 2005;Ravelo et al., 2006). This pointed in the direction of an "El Niñolike" mean state in the mid-Pliocene, and it was even suggested that there might be a "permanent El Niño" without any interannual variability around that state . The coarse temporal resolution and choice of calibration of ocean sediment proxy reconstructions make it challenging to say anything about such variability. Because of this, even a "La Niña-like" mean state has been proposed for the mPWP (Rickaby and Halloran, 2005). Scroxton et al. (2011) present evidence for clear ENSO variability in the Pliocene based on ocean sediment isotopes, possibly despite reduced zonal SST gradients. Watanabe et al. (2011) present coral skeleton data showing ENSO variability that is similar to that of the present day. Mg/Ca measurements indicating subsurface temperatures in the western equatorial Pacific again point towards a reduced zonal temperature gradient (Ford et al., 2015), while more recent alkenone proxy reconstructions show a moderate reduction in the tropical Pacific zonal SST gradient (Tierney et al., 2019). The latest proxy reconstructions by White and Ravelo (2020) show that the mid-Pliocene ENSO amplitude varied between reduced and similar to the present day, and they associate that with a weaker thermocline feedback.
The earliest modelling studies on the early Pliocene and mid-Pliocene ENSO are in favour of the "permanent El Niño", when using idealized experiments with ocean-only or forced atmosphere GCMs . However, later studies using a coupled atmosphere-ocean GCM (Haywood et al., 2007;Bonham et al., 2009) and using a Zebiak-Cane model (von der Heydt et al., 2011) clearly resolved ENSO-like interannual variability. While most modelling studies show (slightly) reduced zonal SST gradients, a study using the Zebiak-Cane model of the tropical Pacific suggests a westward shift in the position of the "cold tongue" (CT) under increased background temperatures, whereas a weaker zonal SST gradient would be mostly associated with weaker background trade winds (von der Heydt et al., 2011). Later, in the coordinated modelling efforts of PlioMIP1, all models show ENSO-like variability, and eight out of nine studies agree on a reduced ENSO variability compared to the pre-industrial (Brierley, 2015), although the magnitude of reduction varies considerably among different models. This robustly weaker ENSO is accompanied by a shift to lower frequencies (i.e. longer periods) in most models, while again the magnitude of the dominant frequency varies among the model ensemble. Moreover, the PlioMIP1 models do not show a consistent reduction in the mean zonal SST gradient, and a clear reason for weaker ENSO variability is not found. Research with the HadCM3 model has pointed out the importance of centennial-scale variability in ENSO behaviour, suggesting there could have existed periods with both weaker and stronger ENSO variability in the mid-Pliocene (Tindall et al., 2016).
Modelling efforts on past as well as future climates show different responses of ENSO variability to radiative and geographical forcings. Collins et al. (2010) show that from the CMIP3 ensemble, it is not possible to determine whether the amplitude and frequency of ENSO variability will change in the future under climate change. Also the more recent CMIP5 ensemble provides no clear consensus on whether ENSO amplitude would decrease or increase in the future, and what feedbacks might change (Kim et al., 2014). However, Cai et al. (2014) Fredriksen et al. (2020) show that, interestingly, the increase in El Niño SST variability is linked to a decrease in the zonal SST gradient in the tropical Pacific. Idealized warming experiments by Callahan et al. (2021) show a robust decrease in ENSO amplitude in the long term. Yeh et al. (2009) and Ashok and Yamagata (2009) show that the "flavour" of El Niño will change in the future, shifting from mainly cold tongue El Niño events to more "warm pool" (WP) El Niño events, implying that the largest temperature variations will shift more towards the central Pacific. However, recent work using CMIP5 and CMIP6 data does not necessarily agree with this, showing that changes in El Niño flavour are model-dependent (Freund et al., 2020). When reproducing the climate in recent decades, climate models suggest a reduced zonal SST gradient in the tropical Pacific due to rising greenhouse gas concentrations, while observations show a strengthened gradient (Coats and Karnauskas, 2017). This discrepancy between coupled models and observations is attributed to the cold bias in the equatorial cold tongue by Seager et al. (2019). However, Heede et al. (2020) show that the initial transient response to CO 2 forcing is characterized by a strengthened zonal SST gradient, while the equilibrium response shows a warmer cold tongue. Jiang et al. (2021) show that this cold bias reduces in the CMIP6 ensemble, when comparing with CMIP5, but still exists. Brown et al. (2020) investigate ENSO both in the mid-Holocene, last glacial maximum (lgm) and last interglacial (lig) simulations of PMIP3/4 and idealized warming experiments of CMIP5/6. They find a clear decrease in ENSO variability in the lig and mid-Holocene simulations, despite a stronger zonal SST gradient. Closer inspection demonstrates no clear correlation between the mean zonal SST gradient in the tropical Pacific and ENSO amplitude, when considering the PMIP3/4 and CMIP5/6 ensembles.
While the PlioMIP1 ensemble was able to adequately reproduce many of the spatial patterns in surface temperature as reconstructed from proxies, a number of uncertainties and model-data mismatches remained, in particular regarding the warming in high latitudes . High- latitude temperatures clearly also affect the tropical climate, for example through effects on the Hadley Cell and tradewind strength caused by an altered Equator-to-pole temperature gradient. In order to reduce uncertainties and modeldata mismatches, a completely new reconstruction of palaeogeography was used in PlioMIP2 as boundary condition for the models. Analysis of large-scale features of the ensemble by Haywood et al. (2020) shows a larger SAT anomaly compared to PlioMIP1, because of the inclusion of more models with a higher equilibrium climate sensitivity (ECS). It is also shown that the ensemble mean SSTs agree well with newly reconstructed SST proxies by Foley and Dowsett (2019). The PlioMIP2 ensemble also agrees well with reconstructions from a recent SST synthesis study focusing on the same mid-Pliocene time slice (McClymont et al., 2020).
In this work, we study changes in ENSO variability in the PlioMIP2 ensemble compared to the pre-industrial and relate this to differences in the mean background climate of the tropical Pacific. In Sect. 2, we briefly introduce the models that participate in the PlioMIP2 ensemble and describe the methods used to analyse ENSO. Following this, we investigate ENSO variability in PlioMIP2 in terms of amplitude, frequency, spatial structure and "flavour" and compare this to pre-industrial reference simulations in Sect. 3.1. Next, we investigate the relation between ENSO amplitude and the mean zonal SST gradient and study whether the tropical Pacific mean state is El Niño-like in the mid-Pliocene, in Sect. 3.2. In Sect. 4, the results are discussed by comparing them with observational data as well as highlighting intermodel differences within the ensemble. We conclude with a summary and outlook.

The PlioMIP2 ensemble
A total of 17 climate models form the PlioMIP2 ensemble, which is almost double the size of the PlioMIP1 ensemble. A list of the models, performing institute and reference to the work describing the individual models in more detail is presented in Table 1. All models have performed simulations following the PlioMIP2 experimental protocol (Haywood et al., 2016b), providing both the pre-industrial control experiment (E 280 ) and the mid-Pliocene experiment (Eoi 400 ) data. Pre-industrial simulations are forced with ∼ 280 ppmv atmospheric CO 2 concentrations. The mid-Pliocene simulations are forced with 400 ppmv CO 2 and all models apart from HadGEM3 use significantly different geographic boundary conditions, including closed Arctic Ocean gateways (Bering Strait, Canadian archipelago) and reduced land ice coverage (Greenland ice sheet, West Antarctic ice sheet). Results of several large-scale features, such as global mean surface air temperature (SAT), polar amplification factor and equilibrium climate sensitivity (ECS) of the PlioMIP2 ensemble, are presented in Haywood et al. (2020). Note that the results of the HadGEM3 model were not included in that paper, as the simulations finished after the time of writing. Details for HadGEM3 can be found in Williams et al. (2021). More details on the Eoi 400 simulations of the individual models can be found in the references listed in Table 1.
Each modelling group has provided (at least) 100 years of both the E 280 and Eoi 400 simulation for analysis. In this work, we consider the last 100 years of monthly SST data in order to quantify and investigate ENSO variability. In the Supplement, we discuss the robustness of our analysis methods using the results of two ensemble members, using 500 years of data. Data were regridded onto a regular 1 • × 1 • grid using a bilinear interpolation, in order to analyse the model results in the same way. Interpolating data on a common grid could smooth out spatial variations and remove local extremes, while it can also act to suppress certain unreliable grid-box-scale features (Räisänen and Ylhäisi, 2011). The majority of the models use a nominal horizontal ocean resolution that is close to 1 • . Some of the models employ a telescoping grid that results in a finer resolution near the Equator, such as IPSLCM6A-LR as well as CCSM4-Utr (∼ 0.33 • and ∼ 0.67 • , respectively). Since we are not interested in gridbox-scale features, and often considering spatial means as well as ensemble means, we do not expect the regridding to significantly impact the results.

Niño indices
The Niño 3.4 index, defined as the monthly SST anomaly in the Niño 3.4 region in the equatorial Pacific, is most commonly used in present-day ENSO analysis. It can be used to determine the amplitude and period of ENSO variability. The Niño 3.4 region is used since it shows the largest correlation with SST variability in the whole tropical Pacific. However, the question remains as to whether this is also true for the mid-Pliocene ENSO. Figure 1 shows the standard deviation (SD) of the SST anomalies in the tropical Pacific for (a) the pre-industrial E 280 ensemble mean and (b) the mid-Pliocene Eoi 400 ensemble mean. Indicated in the plot are the four commonly used regions to study ENSO variability: the Niño 4, Niño 3.4, Niño 3 and Niño 1+2 regions. The magnitude of SST variability decreases in the mid-Pliocene equatorial Pacific, but it seems that the region with the largest ENSO-related variability keeps its position.
To determine the SST anomaly pattern with the largest variance, we perform principal component analysis (PCA) on the detrended monthly SST anomalies. As the climatology is subtracted, we expect the leading principal component (PC1) in the tropical Pacific to capture ENSO variability. We correlate the PC1 with the four different Niño indices to check which region is representative for ENSO variability, both in Table 1. Details on the models contributing to the PlioMIP2 ensemble. The left-most column shows the model ID according the PlioMIP2 naming convention in bold, and the formal model ID in brackets, when different. More details (e.g. on treatment of sea-ice and vegetation) can be found in Haywood et al. (2020) and Williams et al. (2021 the pre-industrial simulations and in the mid-Pliocene. The results per model are presented in Table 2. In the pre-industrial simulations, 14 of the 17 models show the largest correlation in the Niño 3.4 region. The ensemble mean shows the largest correlation for the Niño 3.4 index and the results agree well with data obtained from 1920-2020 HadISST observations (Rayner et al., 2003). Of the mid-Pliocene simulations, 15 of the 17 models show the largest correlation with the Niño 3.4 index. The mid-Pliocene ensemble mean clearly shows the largest correlation in the Niño 3.4 region. Henceforth, we will be using the Niño 3.4 index to quantify ENSO variability. Furthermore, some of the analyses performed in this study have also been repeated with the Niño 3 index instead of the Niño 3.4 index, resulting in the same conclusions.

Quantifying ENSO variability
In order to quantify ENSO variability within the PlioMIP2 ensemble, we look at four main features of ENSO: (1) ampli-  tude, (2) period, (3) spatial structure and (4) El Niño flavour (see below for explanation and specific definitions used). To compute the amplitude and period we use the Niño 3.4 index. Anomalies are taken with respect to a mean seasonal cycle computed on the full 100 years of data that are available. No running mean is applied. Before any analysis, linear trends are removed from the Niño 3.4 time series. The analysis methods used here are similar to those in the analysis of ENSO in the PlioMIP1 ensemble by Brierley (2015). Results for individual models are included in the Supplement. We assess the properties of ENSO variability using the second, third and fourth statistical moments of the Niño 3.4 index, namely the standard deviation (SD), skewness and kurtosis. The first moment, the mean, is zero by definition. The ENSO amplitude is defined as the SD of the Niño 3.4 time series. Furthermore, the skewness and kurtosis of the Niño 3.4 index are computed (following the Fischer-Pearson definition; Zwillinger and Kokoska, 2000). Both are normalized on the variance, and a factor of 3.0 is subtracted from the kurtosis to give 0.0 kurtosis for a normal distribution. Positive or negative skewness provides information on whether there are more El Niño or more La Niña events, respectively. Positive kurtosis indicates that there are more values around the mean and the SD is mainly determined by extreme values, while negative kurtosis implies a more uniform distribution of values. The kurtosis thus provides information on the relative occurrence of extreme events (i.e. El Niño and La Niña events). The variation of the Niño 3.4 SD has been investigated using 500 years of data from CESM2 and MIROC4m and was found to not be large enough to impact the conclusions of the ensemble. Results of this are included in the Supplement.
To investigate the ENSO period(s), we perform a spectral analysis of the Niño 3.4 index. Before the power spectra are computed, the indices are normalized with their SD to obtain time series with zero mean and unit SD. It was chosen to use the multi-taper spectral method as explained by Ghil et al. (2002), with three tapers and a bandwidth parameter of two. This spectral method is preferred over the classic periodogram because of its statistical robustness and reduction of spectral leakage. All spectra are scaled with respect to their respective sum. We assess the significance of spectral peaks by performing a red noise test on each time series. The 90 %, 95 % and 99 % confidence levels are determined using a first-order autoregressive (AR(1)) model with 10 000 Monte Carlo-generated surrogates.
To quantify a change in ENSO period in the mid-Pliocene from the pre-industrial, we consider two measures: (1) the ensemble mean power spectra and (2) the periods of the ensemble sum of peaks that are above the 90 %, 95 % and 99 % confidence levels. The mean power spectrum gives information on what happens to the spectral power in the mid-Pliocene. Note that the mean is taken per frequency bin to create the mean spectrum for the Eoi 400 and E 280 runs. However, a mean spectrum does not contain any information on the significance of the spectral peaks. For this reason, we introduce a procedure where we count all the spectral peaks above the 90 %, 95 % and 99 % confidence levels in the 1.5-10-year period per simulation. The periods of these significant peaks are then binned for the whole ensemble. This histogram of the significant peaks provides information on the period or period range that represents the most ENSOrelated activity. The method has been tested for robustness by repeating the procedure using five instead of three tapers, using a bandwidth parameter of four instead of two, using the classic periodogram (fast Fourier transform) instead of the multi-taper method and using the Niño 3 instead of the Niño 3.4 index. The conclusions are found to be independent of these different settings. A summary of these results is included in the Supplement.
We use principal component analysis (PCA) on the monthly SST anomalies to investigate changes in the spatial structure of ENSO. The first empirical orthogonal function (EOF) is determined in the tropical Pacific (defined as 23 • S-23 • N and 140-280 • E). We follow the methodology of Power et al. (2013) in performing EOF analysis and normalizing the EOF with the spatial SD in order to remove the ENSO amplitude signal, as Brierley (2015) did for the PlioMIP1 ensemble. As the climatology and trend are removed, we expect the first EOF in the tropical Pacific to correlate with ENSO variability and not with the seasonal cycle. The spatial EOF pattern and the percentage of variance explanation will be compared for both simulations, to assess to which degree ENSO variability differs.
We can distinguish between two El Niño "flavours", being (1) the cold tongue or eastern Pacific El Niño and (2) the warm pool or central Pacific El Niño (also known as El Niño Modoki, Ashok and Yamagata, 2009;Yeh et al., 2009). These types are distinguishable based on the region of their largest ENSO amplitude (hence the naming). Here we use the methodology proposed by Ren and Jin (2011), using two new indices for the cold tongue (N CT ) and warm pool (N WP ) El Niño, combining the Niño 3 (N 3 ) and Niño 4 (N 4 ) indices: where α = 2/5 if N 3 N 4 > 0 and α = 0 otherwise. We will quantify whether there is a change in El Niño flavour in the mid-Pliocene by computing and comparing the SD of the N CT and N WP indices.
Lastly, we will look at the tropical Pacific mean state, specifically the zonal SST gradient in the equatorial Pacific Ocean, to find explanations for a change in ENSO behaviour. For this, we compute the annual mean SSTs in the tropical Pacific Ocean (23 • S-23 • N and 140-280 • E) from the 100-year monthly data for each model and simulation. We define the zonal SST gradient as the SST difference between the warm pool region in the equatorial western Pacific Ocean (5 • S-5 • N, 150-170 • E) and the cold tongue region in the equatorial eastern Pacific Ocean (5 • S-5 • N, 240-260 • E). Both the tropical Pacific as well as the warm pool and cold tongue region bounds are indicated in Fig. 7, for reference. Furthermore, to assess whether the mean state of the mid-Pliocene becomes more El Niño-like, we spatially correlate the annual mean SST change (Eoi 400 -E 280 ) to the pre-industrial (E 280 ) pattern of the leading EOF.

Comparison with observations
We compare pre-industrial simulation results to those obtained via HadISST observational data, taken from 1920-2020 (Rayner et al., 2003, latest data available through https://www.metoffice.gov.uk/hadobs/hadisst/ data/download.html, last access: 11 May 2021). The observational data can be used to assess how well individual models as well as the pre-industrial ensemble mean perform with respect to historical ENSO records. Note that there will be impacts of anthropogenic forcing such as greenhouse gas emissions in the HadISST observational data that will not be present in the pre-industrial simulation results.
The mid-Pliocene mean climate simulation results will be compared with reconstructed SST proxies by Foley and Dowsett (2019)

ENSO variability
3.1.1 Statistical moments Figure 2 shows the SD, skewness and kurtosis for all models with respect to their pre-industrial values. The ensemble mean is also shown in the plots, and the HadISST observational data are included as reference. Individual Niño 3.4 index time series for all ensemble members are included in Fig. S1 in the Supplement. Figure 2a shows a clear reduction of the Niño 3.4 SD in the Eoi 400 ensemble mean compared to the E 280 ensemble mean, with 15 out of 17 individual models agreeing. The mid-Pliocene ensemble mean shows a 24 % reduction in ENSO amplitude (compared to 20 % in PlioMIP1, Brierley, 2015), with a standard deviation of ±18 %. The spread of the individual values is large; the difference between the smallest and largest SD is around 1.2 • C for both the preindustrial and mid-Pliocene simulations. Still, the ensemble mean pre-industrial value is close to the HadISST observations (0.91±0.28 • C and 0.77 • C, respectively). Notable individual models are COSMOS for having large SD in both runs and CCSM4-Utr for showing the largest absolute decrease in SD. The models showing almost no difference between E 280 and Eoi 400 are COSMOS, GISS2.1G and MRI2.3. The same model version of MRI was included in the PlioMIP1 ensemble, where it also showed no change (Brierley, 2015).
A less coherent pattern arises from the skewness of the Niño 3.4 index, shown in Fig. 2b. Again, the spread of the values is large. Most individual models (14 out of 17), as well as the ensemble mean, show a smaller skewness in the pre-industrial compared to the HadISST observations. The ensemble mean shows no significant change of the skewness in the mid-Pliocene simulations. Around half of the models (9 out of 17) show a positive skewness in both simulations, indicating generally more El Niño than La Niña events. However, the individual models do not show agreement on what happens to the skewness in the Eoi 400 runs, compared to the E 280 reference. GISS2.1G shows the most negative skewness in the mid-Pliocene. Both NorESM-L and EC-Earth3.3 show the most negative skewness in the pre-industrial. Notable again is CCSM4-Utr for having the largest skewness in the pre-industrial and the biggest absolute decrease in the mid-Pliocene. CESM1.2 shows the opposite behaviour with the largest increase in the skewness in the mid-Pliocene, although the pre-industrial result shows a good agreement with the HadISST data.
The kurtosis of the Niño 3.4 index is shown in Fig. 2c. The pre-industrial ensemble mean kurtosis is close to zero, which does not agree with results from the HadISST observations (albeit within standard deviation range), which show a positive kurtosis. The ensemble mean shows an increase in kurtosis in the mid-Pliocene simulations, with 11 out of 17 individual models agreeing. This indicates that the distribution becomes more "heavy-tailed", implying that there is more activity around the mean (neutral state) and the variance is mainly determined by some extreme tail values (i.e. "extreme" El Niño or La Niña events). The notable deviations from the ensemble are GISS2.1G (large negative kurtosis in both runs), CCSM4-Utr (largest decrease) and CESM1.2 (largest increase).

Spectral analysis
We investigate the ENSO period by looking at the Niño 3.4 power spectra. The power spectra for all the individual simulations are shown in Fig. S2, including the HadISST spectrum. We find that 12 out of the 17 models show an increase in the period of maximum spectral power in the mid-Pliocene, compared to the pre-industrial. However, ENSO does not have one isolated period in the power spectrum. While there are models that show peaks in spectral power in the mid-Pliocene at higher periods than in their pre-industrial counterparts, in many cases these spectral peaks do not exceed the threshold for statistical significance. Therefore, we only consider the ensemble mean power spectrum as well as the histogram of significant spectral peaks here.
The ensemble mean power spectra are shown in Fig. 3. The HadISST spectrum is also included. The shading represents the second-to-smallest and second-to-largest power value per frequency. The pre-industrial spectrum resembles the HadISST spectrum, except for a less clear separation of the spectral peaks. The pre-industrial mean spectrum shows the largest power in the 3-7-year period, which is generally associated with ENSO. The mid-Pliocene mean spectrum shows a peak at a period of 5 years. The clearest change in the mid-Pliocene is the reduction of power in the 3-4-year period. However, the range in modelled values is large for both simulations, and the individual power spectra show a lot of differences in power and peak locations (see Fig. S2). Some of the models show little difference between the pre-industrial and mid-Pliocene simulations (for example CCSM4). Many show a decrease in spectral intensity in the mid-Pliocene. However, GISS2.1G and MRI2.3 also show significant peaks (> 99 % CI) in the mid-Pliocene simulation that are not present in the pre-industrial reference. The clear shift to longer periods or lower frequencies found in the PlioMIP1 ensemble mean spectrum is not reproduced by the PlioMIP2 ensemble mean (Brierley, 2015).
A way to assess the change in ENSO period in the mid-Pliocene is through counting the significant peaks in the spectra, as explained in Sect. 2.2. Figure 4 shows (a) a histogram of the number of peaks that are above a certain (90 %, 95 % and 99 %) confidence level per period bin and (b) the difference between the mid-Pliocene and pre-industrial. It can be seen from Fig. 4a that the majority of the significant peaks in the pre-industrial are found in the 2-5-year period, with a clear maximum around the 3-4-year period. The total number of spectral peaks that can be called significant de-  creases in the mid-Pliocene for all the confidence levels. The number of peaks that are above the 99 % confidence level reduces by one-third.
The change in number of significant peaks is shown in Fig. 4b. It clearly shows the decrease in the number of peaks for all the confidence levels, mainly in the 2-5-year period. The reduction is strongest in the 3-4-year period. In the 1.5-2-year period, there is a slight increase in the number of peaks above the 90 % and 95 % confidence levels. The change in number of peaks mainly shows that the spectral power in the 3-4-year El Niño period significantly decreases and that this power does not shift to different periods.

Spatial structure
To study the changes in spatial structure of ENSO, we compute the EOFs of the tropical Pacific SST anomalies. The first EOFs for all the individual simulations are shown in Fig. S3. The EOFs are defined to be positive in the Niño 3.4 region. The percentage of the variance that is explained by the first EOF is also computed and shown in the bottom left of the plots.
To compare the mid-Pliocene to the pre-industrial EOFs, the ensemble means are shown in Fig. 5b-d, with the first EOF of the HadISST anomalies in 1920-2020 also included in (a). The stippling is included in regions where less than 12  Fig. 5a, which is a known model bias (Jiang et al., 2021). It can be seen from Fig. 5b and c that the spatial structure looks qualitatively the same in both simulations. Also, the majority of the individual models agree with the ensemble mean pattern, indicated by the spread of the stippling. The majority of the individual models show little change, with only CCSM4-Utr, HadCM3 and MIROC4m showing spatial differences (see also Fig. S3). The percentage of variance in the tropical Pacific, which the first EOF explains, decreases from 50 % in the pre-industrial to 42 % in the mid-Pliocene within the ensemble mean. In total 14 out of 17 models agree with this decrease, with only MRI2.3 showing a slight increase in the mid-Pliocene run and COSMOS and GISS2.1G showing no change. The percentage of variance explained is very similar in the pre-industrial ensemble mean and the HadISST result.
The difference between the two ensemble mean EOFs is shown in Fig. 5d and shows a decrease in the central equatorial Pacific region. This is in agreement with the ensemble mean reduction in ENSO amplitude (or Niño 3.4 SD). The EOF difference shows an increase in the areas slightly to the north and south of this region. This suggests that there is a shift of the El Niño warmth expanding further across the tropical Pacific. This is also concluded in the PlioMIP1 ensemble, where the spatial pattern of the EOF difference looks qualitatively very similar (Brierley, 2015). Accompanying this meridional expansion of the warmth of an El Niño event is the southward migration of the South Pacific Convergence Zone (SPCZ), as was also shown by Pontes et al. (2020). It can be seen from the stippling that the majority of the models agree with a decrease in the EOF signal in the Niño 3.4 region. The rest of the tropical Pacific shows large areas with stippling, indicating that there is a lot of model disagreement on the sign of the change in EOF.
The correlation coefficient of the leading principal component with the different Niño indices is shown in Table 2. As was concluded before, the majority of the models have the largest correlation in the Niño 3.4 region, both in the E 280 as well as in the Eoi 400 simulations. This result agrees with the EOF pattern results showing that the area in the tropical Pacific with the largest ENSO variability is not significantly different in the mid-Pliocene.

El Niño flavour
A change in El Niño "flavour" in the pre-industrial and mid-Pliocene simulations will be investigated using the methodology of Ren and Jin (2011). The amplitude of the cold tongue or eastern Pacific El Niño is defined as the standard deviation (SD) of the N CT index. Likewise, for the warm pool or central Pacific El Niño the SD of the N WP index is used. The change in magnitude of both amplitudes follows the changes in the Niño 3.4 SD results. We are interested in the difference in the ratio between the two types. The ratio of the warm pool El Niño amplitude to the cold tongue El Niño amplitude N WP /N CT is shown in Fig. 6. For the majority of the models (13 out of 17), the ratio is smaller than 1 in both of the simulations, implying there are generally more cold tongue El Niño events. This is in agreement with the result from the HadISST observations, although the HadISST ratio is smaller than most of the model results. The ensemble mean shows no change in the ratio of warm pool to cold tongue El Niño events, and most individual models also show little change. Notable is HadCM3 for showing a majority and large relative increase in the number of warm pool El Niño events in the mid-Pliocene (very similar results to in PlioMIP1; see Brierley, 2015), and NorESM1-F and GISS2.1G for showing a small relative amount of warm pool El Niño events in both simulations. The PlioMIP2 results are somewhat similar to results from CMIP5 and CMIP6 future scenarios, which show a large model spread and no clear shift in El Niño flavour (Freund et al., 2020).

Tropical Pacific mean climate
In this section, we will analyse changes in the tropical Pacific mean state and investigate whether this relates to the changes in ENSO variability in the mid-Pliocene simulations. We look at the tropical Pacific annual mean SSTs and specifically the zonal SST gradient along the equatorial Pacific and relate these to changes in the Niño 3.4 SD and the pre-industrial leading EOF. It can be seen that the E 280 ensemble mean pattern looks qualitatively similar to the HadISST observations. Consistent with other studies using coupled GCMs (Collins et al., 2010;Coats and Karnauskas, 2017;Brown et al., 2020), the E 280 means shows colder temperatures in the central and western Pacific and slightly warmer temperatures in the eastern Pacific compared to observations. The "cold bias" in the eastern Pacific can be expected since, firstly, the pre-industrial simulations are compared with historical observations and, secondly, the models have insufficient resolution to reproduce the cold conditions of coastal upwelling systems, such as the Benguela upwelling system (McClymont et al., 2020).

Annual mean pattern
The Eoi 400 ensemble mean is warmer but qualitatively similar to the pre-industrial result. The markers in Fig. 7c are SST proxies reconstructed by Foley and Dowsett (2019) (hereafter referred to as PRISM4) and by McClymont et al. (2020), including two U k 37 and two Mg/Ca reconstructions (hereafter referred to as MC-UK37 and MC-Mg/Ca, respectively). There is good agreement between the PlioMIP2 ensemble mean and the PRISM4 SSTs in the eastern Pacific upwelling region. The two points in the warm pool region show slightly lower SSTs compared to the Eoi 400 ensemble mean. The MC-UK37 point around 240 • E shows a higher value compared to the ensemble mean, while the MC-Mg/Ca around 273 • E shows a lower value. Figure 7d shows the ensemble mean Eoi 400 − E 280 difference. The colour bar of the plot highlights deviations from the tropical Pacific mean SST difference (∼ +1.9 • C), meaning that the red and blue areas show warmer and cooler parts with respect to the tropical Pacific mean difference, respectively. It shows specific warmer parts along the Equator and in the upwelling region in the eastern Pacific. It also shows a clear cooler part in the southern tropical Pacific, implying a shift of the SPCZ as is also seen from the change in EOF (Fig. 5d). This result agrees with findings by Pontes et al. (2020) that investigated shifts in precipitation patterns in the PlioMIP1 and PlioMIP2 ensembles.
The six points A-F in Fig. 7d represent differences between proxies and HadISST; the respective eight values are presented in Table 3. Points A and B in the warm pool region show similar temperatures to those in the pre-industrial, not agreeing with the increase in temperature seen in the PlioMIP2 ensemble mean. Points C-F in the central and eastern Pacific show a better agreement with the ensemble mean difference, albeit with a range of ±1 • C. The only clear outlier is the MC-Mg/Ca proxy at 273 • E at location E, indicating a significant SST decrease compared to the pre-industrial. This temperature decrease is not captured by the PlioMIP2 ensemble mean difference nor by any individual model.  Table 3.

Zonal SST gradient
Early observational work suggested a highly reduced zonal SST gradient in the tropical Pacific in the mid-Pliocene (Wara et al., 2005;Fedorov et al., 2013), and it has been argued that ENSO properties might be related to this mean state feature. The PlioMIP1 ensemble did not show a clear agreement on the zonal SST gradient, and it was concluded that a reduction in zonal SST gradient could not explain a reduction in ENSO variability (Brierley, 2015). Here, we will investigate whether there is any correlation between the zonal SST gradient and the change in Niño 3.4 SD in PlioMIP2. Figure 8 shows the ensemble mean and meridional mean (5 • S-5 • N) SST in the Pacific Ocean. The individual model  results are included in Fig. S4. The HadISST 1920-2020 result is included for reference. The shading shows the range of modelled values. The HadISST values fall within the range of pre-industrial modelled values but are on average 0.7 • C warmer. This "warm bias" can largely be attributed to anthropogenic warming trends that are present in the HadISST data (∼ +0.5 • C in the equatorial Pacific, from 1870 to 2020) but absent in the pre-industrial simulation results. The mid-Pliocene ensemble mean follows the same zonal dependence as the pre-industrial but is consistently 2 • C warmer, as can be seen from the Eoi 400 -E 280 difference. However, the model spread is large. The range of values is especially large in the eastern Pacific cold tongue region, where the modelled values range from 0 to 4 • C of warming. We quantify the zonal SST gradient in the tropical Pacific as the SST difference between the warm pool region in the equatorial western Pacific Ocean (5 • S-5 • N, 150-170 • E) and the cold tongue region in the equatorial eastern Pacific Ocean (5 • S-5 • N, 240-260 • E). These regions are also indicated in Figs. 7 and 8. Figure 9a shows the zonal SST gradients for the pre-industrial and mid-Pliocene simulations. The ensemble mean shows little change in the mid-Pliocene, and a reasonable agreement with the HadISST observations. This result is in line with what is shown in Fig. 8. However, a majority of the models (13 out of 17) actually shows a slightly reduced zonal SST gradient in the mid-Pliocene simulations. The ensemble mean is very much affected by two outliers, namely MRI2.3 and COSMOS, that show a greatly increased zonal SST gradient. It should be noted that the preindustrial simulation of MRI2.3 has its cold tongue centred around 220 • E instead of 250 • E, and this is the main reason for the large difference in the zonal SST gradient (see the individual model result in Fig. S4).
To investigate the relation between the zonal SST gradient and ENSO amplitude, we present a scatter plot of the change in zonal SST gradient versus the change in Niño 3.4 SD in Fig. 9b. The ensemble mean shows a ∼ 24 % decrease in Niño 3.4 SD despite little change in zonal SST gradient (−0.17 ± 0.77 • C). The "cluster" of models around the ensemble mean does show a slightly more reduced zonal SST gradient together with a robust reduction in Niño 3.4 SD. On the one hand, CESM2 shows the largest reduction in zonal SST gradient but a small reduction in Niño 3.4 SD. On the other hand, both COSMOS and MRI2.3 show a great increase in zonal SST gradient but a similar Niño 3.4 SD, although for MRI2.3 the increase in zonal SST gradient is mainly due to the overly westward extent of the cold tongue in the MRI2.3 model discussed above. CCSM4-Utr, however, shows the largest reduction in Niño 3.4 SD despite a similar zonal SST gradient. Ultimately, there does not seem to be a strong correlation between the change in zonal SST gradient and the change in Niño 3.4 SD in the PlioMIP2 ensemble. This conclusion is consistent with a recent study by Brown A. M. Oldeman et al.: Pliocene ENSO et al. (2020) considering a large ensemble of CMIP5/6 and PMIP3/4 simulations, of which 10 out of the 32 models included have also contributed to the PlioMIP2. Fredriksen et al. (2020) analyse results of future scenarios with 11 CMIP6 models and link a reduced zonal SST gradient with an increase in ENSO amplitude, which would disagree with our findings. However, Beobide-Arsuaga et al. (2021) perform a similar analysis on 56 CMIP5 and CMIP6 contributions and also conclude that the relationship between ENSO amplitude change and zonal SST gradient is weak.

The El Niño-like mean state
To investigate whether the mean SST pattern in the tropical Pacific is El Niño-like, we project the changes in the mean state (i.e. Eoi 400 SSTs − E 280 SSTs) onto the leading EOF of the pre-industrial E 280 simulation for each model and perform a spatial correlation. If this correlation is positive, it implies that the mid-Pliocene mean SSTs are more similar to the pre-industrial El Niño than the pre-industrial mean SSTs. Likewise, if the correlation is negative, the mid-Pliocene mean can be said to be more similar to a La Niña. Results are shown in Fig. 10a. The ensemble mean shows no correlation of the mean state changes to either an El Niño or a La Niña, and about half of the individual models (8 out of 17) also show little correlation (between −0.3 and 0.3). Interestingly, there is a "cluster" of models ("A": GISS2.1G, COSMOS, IPSLCM6A and CESM2) that show a clear positive correlation (El Niño-like) but little to no change in the Niño 3.4 SD. On the other side, there is cluster -albeit less clearly grouped -showing a negative correlation (La Niñalike) and the largest reduction in Niño 3.4 SD ("B": CCSM4-Utr, EC-Earth3.3 and CCSM4). This contrasting result between CCSM4 and CESM2 was already observed by . Taking into account all the models, there seems to be an increasing trend: the more El Niño-like, the smaller the reduction in ENSO amplitude is. A similar trend is found by Pontes et al. (2021), when considering changes in the thermocline slope in the PlioMIP2 simulations. However, we should consider here the results from the PlioMIP1 ensemble (Brierley, 2015), shown in Fig. 10b, that suggest the opposite trend. Although some of the individual models that were also a part of PlioMIP1 show very different results in PlioMIP2, the ensemble mean is actually very similar, despite the fact that different boundary conditions are used in both MIP protocols.
To analyse the El Niño-like mean state in a more detailed fashion, we focus on the two "clusters" that were identified in Fig. 10a. We show the cluster model-mean pattern of the annual mean SST changes and the pre-industrial E 280 EOF in the tropical Pacific in Fig. 11. The first group of models is GISS2.1G, COSMOS, IPSLCM6A and CESM2 (group A, Fig. 11a-b), showing El Niño-like mean state changes and strong ENSO variability. The mean state changes show a clear warming pattern along the Equator, very much like the pre-industrial EOF pattern. The warming is also very uniform, implying no changes in the zonal SST differences compared to the pre-industrial. The second group of models is CCSM4-Utr, EC-Earth3.3 and CCSM4 (group B, Fig. 11cd) and show La Niña-like mean state changes with a strongly reduced ENSO amplitude. The mean state changes show a relatively weaker warming along the Equator compared to the tropical Pacific mean SST changes. Large variations can be seen in the western Pacific, where the equatorial warm pool region warms the least and the subtropical regions show a higher than average warming. The mean state changes show a clear anticorrelation with the pre-industrial EOF pattern.
Interestingly, this group also shows a large warming in the upwelling region along the South American Pacific coast. This implies a reduction in the zonal SST gradient between the upwelling region and the warm pool, in comparison to the pre-industrial.

Discussion
We have investigated ENSO variability in the PlioMIP2 ensemble using a set of different metrics. Some results are quite robust (reduced Niño 3.4 SD, similar spatial structure), but the ensemble often shows a large spread in values. In this section, we will discuss the results in light of observations (both present-day and palaeo) and inter-model differences, and we will suggest possible physical mechanisms of a consistently reduced ENSO variability in the mid-Pliocene simulations.

Pre-industrial observations
We have included the results from the HadISST 1920-2020 dataset as a reference for the pre-industrial simulation results. The time range of the observational data does not cover the pre-industrial period and includes anthropogenic forcing trends (although a linear trend is removed). An argument to use the most recent data is the observational uncertainty that is present in the instrumental record, especially before 1920, as shown by Ilyas et al. (2017) using HadCRUT4 data. It was decided to use a 100-year time series, as with the PlioMIP2 data, and use the most recent data because of a higher spatial resolution. Chelton and Risien (2016) mention zonal and meridional discontinuities in the HadISST dataset, specifically in the Pacific. They advise caution for high-resolution studies but deem the data adequate for largescale SST variability studies. Haywood et al. (2020) prefer to use the NOAA ERSST v5 dataset  over the HadISST data because of consistency with other observational datasets on a global scale.
The Niño 3.4 moments of the E 280 simulations match reasonably well with those from the HadISST data (see Fig. 2). The ensemble mean shows a slightly higher Niño 3.4 SD (0.90) compared to the HadISST results (0.77). The   (Brierley, 2015). Some models were a part of both ensembles, but note that a different protocol was used in the both MIPs.
PlioMIP1 ensemble mean reported a slightly lower SD (0.78, Brierley, 2015) and was compared to the Niño 3.4 SD of the ERSST data (0.69). The overestimation of the model Niño 3.4 SD compared to observational data is also reported in the CMIP5 ensemble (Kim et al., 2014). However, there is a good match with the HadISST tropical Pacific leading EOF and that of the E 280 ensemble mean; both in the percentage of variance explained (both 50 %) and correlation of the Niño 3.4 index with the first PC (both 0.96). This shows that the PlioMIP2 ensemble does a reasonable job in representing pre-industrial ENSO variability.
There is a slight mismatch considering the warm pool (or Central Pacific) to cold tongue (or eastern Pacific) El Niño occurrence ratio (Fig. 6). The E 280 ensemble mean shows a considerably higher ratio (0.80) compared to the HadISST ratio (0.60). The ensemble mean results do agree with a present-day model ensemble presented in Yeh et al. (2009) (approximately 0.6-0.9, although a slightly different method was used). Figure 7 shows a reasonable agreement of the annual mean SSTs with the proxy reconstructions by Foley and Dowsett (2019) and McClymont et al. (2020). The close agreement with most proxies in the eastern Pacific upwelling region is promising, as the mid-Pliocene ensemble mean shows the greatest warming in this region compared to the pre-industrial ensemble mean. One proxy around 240 • E (point C) suggests a higher temperature than the ensemble mean. Considering the two proxy locations in the western equatorial Pacific, the proxies suggest a flat zonal SST profile, possibly indicating an El Niño-like mean state. However, the ensemble mean shows more warming in the western equatorial Pacific and shows a clear zonal SST gradient (see also regarding the McClymont et al. (2020) reconstructions is the reported error of 1.5 • C on their data. This is approximately the same as the range of modelled values on the proxy locations. Taking into account these uncertainties, the model differences fall within the range of the reported error around the observational differences (proxy minus HadISST) as reported in Table 3, with the exception of the 273 • E Mg/Ca proxy. Another comment is that some proxies are located close to coastal areas, where sharp contrasts and large spatial variations in SSTs can be present due to boundary currents and coastal upwelling systems. The PlioMIP2 models generally have insufficient resolution to reproduce upwelling systems (McClymont et al., 2020), and a significantly higher resolution (∼ 0.1 • ) is needed to accurately resolve these (Small et al., 2014). When putting the proxies in a global perspective, there is generally a good agreement with proxies and the PlioMIP2 ensemble mean in the tropics, but discrepancies occur in the middle and higher latitudes, specifically in the Northern Hemisphere. This holds both for the PRISM4 reconstructions  and for the McClymont et al. (2020) reconstructions.

Mid-Pliocene proxies
The PlioMIP2 ensemble shows that it is likely that there was a slight reduction of the zonal SST gradient in the mid-Pliocene equatorial Pacific (Fig. 8), but not as large as some earlier proxy reconstructions suggested. Furthermore, the mean SST changes are not specifically El Niño-like, as the majority of the ensemble members do not show a positive correlation with the E 280 leading EOF of tropical Pacific SSTs (Fig. 10). The PlioMIP2 ensemble also shows clear ENSO variability (although weaker than in the preindustrial), agreeing with the proxy-based reconstructions by Scroxton et al. (2011), Watanabe et al. (2011 and White and Ravelo (2020). More specifically, White and Ravelo (2020) suggest that mid-Pliocene ENSO variability varied between reduced and similar to the present day, which is also captured by the PlioMIP2 ensemble. Watanabe et al. (2011), however, suggest that ENSO variability is similar to the present day, something most PlioMIP2 models do not agree with. Still, the PlioMIP2 ensemble includes four models (of which three participate in CMIP6) that show both ENSO variability similar to the present day and a mean state that is more El Niñolike (Fig. 10a).

Model differences
The PlioMIP2 ensemble shows robust signals when comparing mid-Pliocene and pre-industrial ENSO: the Eoi 400 simulations show a reduced ENSO variability (15 out of 17 models, Fig. 2), a robust decrease in spectral peaks in the 3-4year period (Fig. 4), a similar spatial structure of ENSO compared to the pre-industrial ( Fig. 5 and Table 2) and a (slight) reduction of the tropical Pacific zonal SST gradient (14 out of 17 models, Fig. 9). Still, the spread of the individual model results is often large, and not all findings are unambiguous. Below we will discuss the possible causes of differences between models.  MRI2.3 did not use the provided "enhanced" boundary conditions, but the "standard" version (Haywood et al., 2016b). This is the best possible realization of the mid-Pliocene conditions (such as vegetation and land ice cover) using a modern land-sea mask with closed Arctic ocean gateways. This could explain why MRI2.3 shows an unchanged Niño 3.4 index, as well as no change in the WP / CT El Niño ratio. Next to this, its Eoi 400 mean state changes show no particular correlation with the E 280 EOF, neither in PlioMIP2 nor in MRI2.3's contribution to PlioMIP1. However, HadGEM3 also used the pre-industrial land-sea mask for its mid-Pliocene simulations (with closed Canadian Archipelago, open Bering strait, mid-Pliocene vegetation and land ice cover) and clearly shows different ENSO variability in both its simulations. The explanation for MRI2.3's large increase in zonal SST gradient is its anomalous pre-industrial equatorial Pacific zonal SSTs, showing a cold tongue around 220 • E, disagreeing with the HadISST observations and all other ensemble members.

Models deviating from the ensemble mean
CCSM4-Utr started the Eoi 400 spin-up with ocean temperatures obtained from PlioMIP1 results, whereas most other models used pre-industrial conditions as ocean temperature initialization (Baatsen et al., 2021). It shows one of the highest SAT differences, even though it has a very modest ECS compared to the rest of the ensemble .
Next to this, it shows a large increase in deep-ocean temperatures, namely 4 • C warmer (Baatsen et al., 2021). This is high considering the 2.8 • C SAT increase that the PlioMIP2 ensemble mean shows over oceans . This could explain why CCSM4-Utr proves to be quite an outlier, especially showing a huge reduction in Niño 3.4 SD and the most La Niña-like mean state in the mid-Pliocene. However, also CESM1.2 and CESM2 have used a "warm" ocean initialization  and do not show the same extreme behaviour.
COSMOS also stands out and has one of the coarsest ocean resolutions (∼ 3 • ) of the ensemble. NorESM-L has a similar ocean resolution, however, but usually behaves quite close to the ensemble mean. COSMOS is furthermore the only model that included a dynamic vegetation model, but it seems unlikely that this could have such a big effect on tropical Pacific variability. COSMOS is the only non-CMIP6 model that has a high ECS (4.7 • C), although it is not the highest of the PlioMIP2 ensemble .
Other models with a high ECS are HadGEM3 (5.5 • C), CESM2 (5.3 • C) and IPSLCM6A (4.8 • C) and are all CMIP6 models Williams et al., 2021). Where HadGEM3 and IPSLCM6A perform reasonably close to the ensemble mean, CESM2 values are often outside of the standard deviation range around the ensemble mean. It shows a Niño 3.4 SD that is large in both the pre-industrial and the mid-Pliocene, and a change that is small compared to the ensemble mean (−9 % to −24 %). Furthermore, its mid-Pliocene mean state shows the largest reduction in zonal SSTs and the most El Niño-like mean state. A high ECS alone cannot explain this. CESM1.2 generally performs close to the ensemble mean, and differences between CESM1.2 and CESM2 include an updated atmospheric model (CAM5 to CAM6), land model, sea ice model and a new mixing scheme in the ocean model ) that could all explain the different performance.

Model generation and family
PlioMIP2 includes CCSM4, CCSM4-UoT, CCSM4-Utr, CESM1.2 and CESM2 (the "CESM" family) that share a lot of similarities, especially in their ocean models. The five models show a very similar pre-industrial Niño 3.4 SD (∼ 0.8-1.2 • C), but no agreement in the mid-Pliocene Niño 3.4 SD (∼ 0.3-1.1 • C). They do show a great similarity in the ratio of WP and CT El Niño events between both simulations. However, the five models show little agreement in both the change in zonal SST gradient and the El Niño-like mean state results; CESM2 shows a strongly reduced zonal SST gradient and an El Niño-like mean state, whereas CCSM4 shows almost no change in zonal SST gradient and a more La Niña-like mean state. The results of CCSM4, CESM1.2 and CESM2 are discussed in more detail in . They find a weakened Walker circulation and reduced upwelling in CESM1.2 and CESM2, but not in CCSM4, and this could explain the aforementioned differences.
The two NorESM models (NorESM-L and NorESM1-F) as well as the three IPSL models (IPSLCM5A, IPSLCM5A2 and IPSLCM6A) generally do show a great similarity between their results. HadCM3 and HadGEM3 both belong to the "Hadley Centre" model family but show very different results. Generally HadGEM3 performs close to both the pre-industrial and mid-Pliocene ensemble means. The HadGEM3 pre-industrial results furthermore show a great match with the HadISST observations. The HadCM3 results are overall less close to the ensemble mean. More information on the difference in large-scale features between HadCM3 and HadGEM3 can be found in Williams et al. (2021). Brown et al. (2020) as well as Jiang et al. (2021) argue that the simulation of ENSO may be improved in CMIP6 (compared to CMIP5), especially due to improvements in the sim-ulation of the mean state in the tropical Pacific, such as a reduced double intertropical convergence zone (ITCZ) bias and cold tongue bias. When only the results of CMIP6 models (CESM2, EC-Earth3.3, GISS2.1G, HadGEM3, IPSLCM6A, NorESM1-F) are considered, the ensemble mean results and the conclusions on mid-Pliocene ENSO variability are similar. However, the CMIP6 models do show the strongest reduction in zonal SST gradient in the mid-Pliocene (Fig. 9). Also, it is mainly the CMIP6 models that show El Niño-like mean state changes (Fig. 10), although EC-Earth3.3 exhibits a La Niña-like mean state.

PlioMIP2 versus PlioMIP1
In total, 7 of the 17 models that participate in PlioMIP2 also participated in PlioMIP1, namely CCSM4, COSMOS, HadCM3, IPSLCM5A, MIROC4m, MRI2.3 and NorESM-L. Some of the PlioMIP1 conclusions presented in Brierley (2015) are similar: the Niño 3.4 SD decreases (24 % (±18 %) in PlioMIP2 versus 20 % in PlioMIP1), the spatial structure of El Niño is similar and the El Niño flavour shows no robust change. However, there are also some clear differences. PlioMIP2 shows a significant reduction of the spectral power in the 3-4-year period, where PlioMIP1 mainly shows a shift of power to longer periods; the PlioMIP1 results showed a clear "low-frequency" peak in the pre-industrial around 5-6 years that seemed to shift to 7-8 years in the mid-Pliocene ensemble mean. The significance of the spectral peaks was not assessed, however. Another difference is that the PlioMIP2 models show a clear but small reduction in zonal SST gradient where PlioMIP1 shows "no preferential warming in the eastern equatorial Pacific". An explanation for the difference between the ensemble results is that PlioMIP2 encompasses more and newer models (most CMIP6) that have a higher resolution and include the newest ocean-atmosphere models and parametrizations. Next to this, the boundary conditions of PlioMIP2 are more specifically tuned to a specific time slice in the mid-Pliocene (Dowsett et al., 2016). It should also be noted that Brierley (2015) used more than 100 years of data (up to 200 years) for analysis of some of the models, sometimes including more years of pre-industrial data than mid-Pliocene data for one model. This can impact the statistical confidence of spectral power density, particularly for the longer periods. We chose to use 100 years of data for all the simulations for the sake of consistency, as some models only had 100 years of data available. The different simulation protocol and ensemble composition means that differences between PlioMIP1 and PlioMIP2 can be expected. However, the similarities between both ensembles make the reduction of the ENSO amplitude in the mid-Pliocene more robust.

Robustness of ENSO amplitude and significant-peak analysis
Previous studies from Wittenberg (2009) and Tindall et al. (2016) show that ENSO variability can exhibit significant changes on the interdecadal and centennial scale. While a time-series length of 100 years as used here is clearly sufficient to study properties such as amplitude and spectra of ENSO on its typical interannual timescales, the presence of centennial-scale modulations on ENSO properties could imply that the choice of the 100-year segment affects our conclusions on interannual variability differences between preindustrial and mid-Pliocene. In order to address this issue, we repeated the calculation of the ENSO amplitude with 500 years of data. These discrepancies cannot be explained by centennial-scale variations of ENSO amplitude and are thus likely caused by the differences in the MIP protocols. Next to this, the spectral "peak-counting" method is tested for robustness by repeating the procedure using different spectral settings and the Niño 3 instead of Niño 3.4 index. The conclusions are found to be independent of these different settings. A summary of these results is included in the Supplement. In conclusion, both the reduction of ENSO amplitude as well as the changes in significant spectral peaks in the PlioMIP2 ensemble are robust.

Physical mechanisms
What the PlioMIP2 simulations have in common is the use of identical forcing and (nearly) identical boundary conditions. The changes in ENSO response and mean state in the mid-Pliocene should therefore be a response to the CO 2 forcing and the PRISM4 boundary conditions. Brierley (2015) discusses the impact of the PRISM3 boundary conditions on ENSO and cannot find a clear explanation when considering geography. Differences in boundary conditions from PlioMIP1 to PlioMIP2 mainly include changes to lakes, soils and vegetation, land ice cover, and Arctic gateways such as the Bering strait and Canadian archipelago (Dowsett et al., 2016;Haywood et al., 2016b). We do not expect these differences to directly impact ENSO properties. Brierley and Fedorov (2016) find that the closing of the Bering strait affects ENSO properties but mention it is hard to estimate the statistical significance of these changes. Other changes to the land-sea mask include the shoaling of the Sahul and Sunda shelves. This can affect the Indonesian throughflow, which can affect ENSO properties as shown by Jochum et al. (2009) as well as Brierley and Fedorov (2016). We did not investigate the changes in the Indonesian throughflow in the PlioMIP2 ensemble in this study.
Proxy records suggest the mid-Pliocene mean state sustained El Niño-like conditions Wara et al., 2005) or at least a weak zonal SST gradient . Manucharyan and Fedorov (2014) state that the weak zonal SST gradient was less favourable for ENSO occurrence. The PlioMIP2 results do show a reduced zonal SST gradient, but only slightly, together with a reduced ENSO amplitude. However, a consistent relationship between the two is absent, as can be seen in Fig. 9. Brown et al. (2020) investigate the relationship between zonal SST gradient and Niño 3.4 SD for PMIP3/4 simulations of the mid-Holocene, last glacial maximum and last interglacial, as well as 1 % CO 2 and abrupt 4× CO 2 experiments, and also find no clear relationship. Hu et al. (2013) suggest the relationship between zonal SST gradient and ENSO amplitude might be nonlinear.
Feedback analysis shows that reduction of ENSO variability in the mid-Holocene is likely due to the increase in the negative feedback (mean current thermal advection, An and Bong, 2018), the reduction of major positive feedback processes (thermocline, zonal advection and Ekman feedbacks, Chen et al., 2019) or an enhanced seasonal cycle (Iwakiri and Watanabe, 2019). However, PlioMIP1 (Brierley, 2015) as well as PMIP3 and PMIP4 models  do not show any consistent relationship between changes in the seasonal amplitude and changes in ENSO variability. More recently it has been suggested that the mid-Pliocene saw a generally deeper thermocline and thus a weaker thermocline feedback (White and Ravelo, 2020). Brown et al. (2020) acknowledge that the changes in feedback processes in PMIP3/4 simulations are model-dependent. They also point out that ENSO feedbacks are still inaccurately simulated in most current GCMs (Bayr et al., 2019;Bellenger et al., 2014;Kim et al., 2014;Kim and Jin, 2011).
Recent work by Pontes et al. (2021), analysing both PlioMIP1 and PlioMIP2 results, find that models with a steeper thermocline (deemed La Niña-like in that paper) are related to major ENSO reduction, while a flatter thermocline (El Niño-like) is associated with a similar ENSO amplitude compared to the pre-industrial. This finding is in agreement with the results we find on ENSO amplitude and the El Niñolike mean state (Fig. 10), although a different methodology is used. Pontes et al. (2021) furthermore find that a combina-tion of off-equatorial mean state changes in the mid-Pliocene is not favourable for ENSO. They find a northward displacement of the ITCZ, increased south-eastern trade winds in the western Pacific and an intensified South Pacific Subtropical High in the mid-Pliocene simulations, all associated with suppression of ENSO variability.

Conclusions
The Pliocene Model Intercomparison Project phase 2 (PlioMIP2) coordinated 17 global coupled climate models to simulate the mid-Pliocene and pre-industrial climate. In this work, we have studied the changes in ENSO variability in the mid-Pliocene using the PlioMIP2 ensemble and related those to SST changes in the tropical Pacific mean climate. Even with (nearly) identical forcing and geographical boundaries, the PlioMIP2 ensemble includes a range of model resolutions, parametrizations and initializations, all which contribute to a considerable model spread in the PlioMIP2 ensemble.
The PlioMIP2 ensemble shows a robust decrease in ENSO amplitude. The Niño 3.4 SD decreases for 15 of the 17 models with an ensemble mean decrease of 24 %, compared to 20 % in PlioMIP1 (Brierley, 2015). Changes in Niño 3.4 skewness and kurtosis are less coherent. Spectral analysis shows a clear reduction of spectral power in the 3-4-year period in the mid-Pliocene. Furthermore, the total number of spectral peaks that are above the 99 % confidence level decreases by one-third in the PlioMIP2 ensemble. The shift towards longer periods that is suggested by the PlioMIP1 models does not occur in PlioMIP2. The mid-Pliocene ENSO is similar in spatial structure in comparison to the pre-industrial. EOF analysis shows that the percentage of variance explained by the first EOF decreases from 50 % to 42 % and suggests that El Niño warmth expands meridionally across the tropical Pacific. Correlation with the principal component time series shows that the Niño 3.4 index best captures the ENSO-related variability in the Pacific in both the pre-industrial and mid-Pliocene, also suggesting a similar spatial structure. Analysis of El Niño flavour suggests no particular change in the mid-Pliocene, although the model spread is large.
The tropical Pacific mean SSTs show a clear warming in the eastern Pacific upwelling region, in agreement with proxy data. The PlioMIP2 models do not show a unanimous weakening of the mean zonal SST gradient, with an ensemble mean change of −0.17 ± 0.77 • C. However, 14 out of the 17 models show a (slight) reduction of the zonal SST gradient. A clear connection with the reduced ENSO amplitude is not found, agreeing with recent work by Brown et al. (2020). Mean SST changes are correlated with the preindustrial EOF to study whether the mid-Pliocene is El Niñolike, as suggested by certain proxy studies (Wara et al., 2005;Fedorov et al., 2006). The PlioMIP2 ensemble does not show a unanimous response, and a correlation with the ENSO amplitude and a more or less El Niño-like mean state is not found. Interestingly, there is a "cluster" of four models (of which three are CMIP6 models) that suggest an El Niñolike mean state but a similar ENSO amplitude in the mid-Pliocene. Proxy data suggest that the mid-Pliocene ENSO was reduced (Scroxton et al., 2011), similar to the present day (Watanabe et al., 2011) or possibly varying between both (White and Ravelo, 2020). The PlioMIP2 ensemble shows a range of ENSO amplitudes, varying between strongly reduced and similar to pre-industrial, possibly even despite a slightly reduced zonal SST gradient. As such, it seems to generally agree with most proxy studies on ENSO variability as well as tropical Pacific SSTs so far.
Although it becomes more and more clear that the mid-Pliocene saw generally reduced ENSO variability, the reasons for this are still not clear. White and Ravelo (2020) suggest a weaker thermocline feedback due to a generally deeper thermocline. Pontes et al. (2021) suggest a combination of off-equatorial mean state changes that are unfavourable for ENSO activity. Future research should aim to find an answer to the question of why exactly ENSO variability was reduced in the mid-Pliocene. These answers can help us to understand the physical behaviour of ENSO in warm climates and will help us to understand the behaviour of ENSO in the future.
Code and data availability. A selection of the data presented in the figures in this paper is available in the Supplement. The 100-year regridded SST data are available upon request from Alan M. Haywood (a.m.haywood@leeds.ac.uk), as well as more complete data for PlioMIP2 (with exception of IPSLCM6A and GISS2.1G). PlioMIP2 data from CESM2, EC-Earth3.3, NorESM1-F, IPSLCM6A and GISS2.1G can be obtained through the Earth System Grid Federation (ESGF) (https://esgf-node.llnl.gov/search/ cmip6/, last access: 25 February 2021, ESGF, 2020). A selection of the data from the scatter plots presented in the paper is available in the Supplement.
Author contributions. AMO, MLJB, ASvdH and HAD conceived the presented ideas in this study. AMO performed the analysis and wrote the draft of the paper. JCT contributed data and advice in data analysis. The remaining authors provided the PlioMIP2 experiments and contributed to the writing of the paper. grant no. 17H06323). Their simulations with MIROC4m were performed on the Earth Simulator at JAMSTEC, Yokohama, Japan.
W. Richard Peltier and Deepak Chandan wish to acknowledge that data they have contributed from the CCSM4-UoT model was produced with the support of Canadian NSERC Discovery Grant A9627 t WRP, and they wish to acknowledge the support of the SciNet HPC Consortium for providing computing facilities. SciNet is funded by the Canada Foundation for Innovation under the auspices of Compute Canada, the Government of Ontario, the Ontario Research Fund -Research Excellence and the University of Toronto.
Zhongshi The authors would like to thank Chris Brierley and an anonymous reviewer for helpful feedback and comments on an earlier version of the manuscript.
Financial support. This research has been supported by the Netherlands Earth System Science Centre (OCW (grant no. 024.002.001)).
Review statement. This paper was edited by Steven Phipps and reviewed by Chris Brierley and one anonymous referee.