Investigating oxygen and carbon isotopic relationships in speleothem records over the last millennium using multiple isotope-enabled climate models

The incorporation of water isotopologues into the hydrology of general circulation models (GCMs) facilitates the comparison between modelled and measured proxy data in paleoclimate archives. However, the variability and drivers of measured and modelled water isotopologues, and indeed the diversity of their representation in different models are not well constrained. Improving our understanding of this variability in past and present climates will help to better constrain future climate change projections and decrease their range of uncertainty. Speleothems are a precisely datable paleoclimate 5 archive and provide well preserved (semi-)continuous multivariate isotope time series in the lower and mid-latitudes, and are, therefore, well suited to assess climate and isotope variability on decadal and longer timescales. However, the relationship between speleothem oxygen and carbon isotopes to climate variables also depends on site-specific parameters, and their comparison to GCMs is not always straightforward. Here we compare speleothem oxygen and carbon isotopic signatures from the Speleothem Isotopes Synthesis and AnaLy10 sis database version 2 (SISALv2) to the output of five different water-isotope-enabled GCMs (ECHAM5-wiso, GISS-E2-R, iCESM, iHadCM3, and isoGSM) over the last millennium (850–1850 common era, CE). We systematically evaluate differences and commonalities between the standardized model simulation outputs. The goal is to distinguish climatic drivers of variability for both modelled and measured isotopes. We find strong regional differences in the oxygen isotope signatures between models that can partly be attributed to differ15 ences in modelled temperatures. At low latitudes, precipitation amount is the dominant driver for water isotope variability, however, at cave locations the agreement between modelled temperature variability is higher than for precipitation variability. While modelled isotopic signatures at cave locations exhibited extreme events coinciding with changes in volcanic and solar forcing, such fingerprints are not apparent in the speleothem isotopes, and may be attributed to the lower temporal resolution 1 https://doi.org/10.5194/cp-2021-152 Preprint. Discussion started: 24 November 2021 c © Author(s) 2021. CC BY 4.0 License.

Incorporating SWI within the Earth's hydrological cycle in atmospheric general circulation models (AGCMs), general circulation models (GCMs), and the most complex Earth system models (ESMs) is usually done by adding an additional 5 water cycle to the hydrology of the model under explicit consideration of isotopic fractionation processes through water phase changes (e.g. Tindall et al., 2009;Yoshimura et al., 2008;Werner et al., 2016;Brady et al., 2019;Lewis and Legrande, 2015). This opens new possibilities to study and analyze past and present climates and to compare modelled climate to the archived isotopic signatures (for example, Werner, 2010;Sturm et al., 2010;Xi, 2014).
The Speleothem Isotope Synthesis and Analyses (SISAL) working group has collected a large number of speleothem 10 records globally and compiled the database SISALv2. It has been employed for model-data comparisons of the last glacial maximum, the Mid-Holocene, the last millennium, and the historical period using different models (iCESM: Midhun et al. (2021), iHadCM3: Bühler et al. (2021), ECHAM5-wiso: Comas-Bru et al. (2019); Parker et al. (2021) and GISS-E1-R: Parker et al. (2021)), supporting the usage of the database to evaluate modelled δ 18 O across different time periods, as the method reproduces first-order spatial patterns of isotopic variability . 15 Extensive multi-model comparisons exist for past, present and future as the Paleoclimate Model Intercomparison Project (PMIP3/ PMIP4 Jungclaus et al., 2010;Kageyama et al., 2018) under the overarching Climate Model Intercomparison Project (CMIP5/CMIP6 Taylor et al., 2012;Eyring et al., 2016) to better understand the causes of model spreads in future projections.
Comparisons between models are abundant (Shi et al., 2016;Ba et al., 2014), especially for temperature and precipitation (Parsons et al., 2020;Seftigen et al., 2017), and the impact of external forcing has been studied intensively (Atwood et al., 20 2016;PAGESHydro2k-Consortium, 2017). Simulations of the historical period (1850-2014CE as in Eyring et al. (2016)), or the last millennium (850-1850CE as in Eyring et al. (2016)) that will be the focus of this study, as well as idealized experiments under a range of natural and external forcings are evaluated under different variables. Water isotopes, however, have not been included in the CMIP5/CMIP6 assessments (Taylor et al., 2012;Eyring et al., 2016).
Due to this lack of systematic intercomparison and assessments with and between SWI model simulations, the Stable 25 Water Isotope Intercomparison Group (SWING) was formed. SWING compares isotope-enabled model simulations with observations over the historical period and provides a large dataset to the scientific community (Risi et al., 2012b). The second evaluation in the SWING2-intercomparison of isotope-enabled AGCMs in 2012 showed that model differences most likely arise from differences in processes that control atmospheric humidity (Risi et al., 2012a). Conroy et al. (2013) found that models which realistically capture precipitation patterns in the tropics are not necessarily successful in simulating the 30 isotopic composition of precipitation compared to measured data and vice versa, cautioning on always using multiple models when comparing to paleoclimate proxy records.
All models that are used in this study have been part of the SWING2 assessment for the historical period in their current, previous, or atmosphere-only version. Therefore, this multi-model comparison complements previous work (Jungclaus et al., 2017;Midhun and Ramesh, 2016;Conroy et al., 2013), through its focus on the representation of SWI in different models 35 over the entire last millennium. We aim to identify common model biases (Kageyama et al., 2018) globally and in different regions, as well as distinguish specific climate drivers for modelled isotope variability on decadal and longer timescales.
Variability in models can be either internal resulting from internal interactions and processes, or external as a consequence of changes in radiative forcings (e.g. GHG, volcanoes, and solar irradiance as in Fig. 1). Variability in the speleothem isotopic signal can also be a consequence of external climate-related variability as reflected in climate modes (e.g. El-Niño temperature or precipitation (Warken et al., 2021;Lone et al., 2014;Cosford et al., 2008;Neff et al., 2001). While modelled variability commonly underestimates measured variability in paleoclimate archives with increasing discrepancies on longer timescales (Laepple and Huybers, 2014a), internal variability in speleothems may also overlay the archived signal. Especially on subdecadal to decadal timescales, lag time between the surface rainfall and the cave drip water as well as the usually slow response of the cave micro-climate to the surface climate dampens the signal. 5 The last millennium is a suitable time period for model-data comparisons, as it provides an opportunity to study variability on decadal and longer timescales and to decipher internal variability from externally forced variability (Kageyama et al., 2018). Boundary conditions such as orbital forcing, sea level, and ice sheets are close to present-day, and external variability is mostly driven by variations in volcanic eruptions (Schurer et al., 2014;Neukom et al., 2019;Legrande and Anchukaitis, 2015). It is a key-paleoclimate period for the CMIP5 and CMIP6 experiments (Taylor et al., 2012;Eyring et al., 2016) and 10 speleothem records are abundant in this period (Bühler et al., 2021).
With this study, we aim to contribute to the understanding of both model and data: how do different simulations model oxygen isotopes in the hydrological cycle and how do they compare to archived speleothem data as well as what processes influence speleothem isotope composition and what effects of variability can be captured and later analyzed. We first compare their similarities and differences in the isotopic signatures of precipitation globally (Sect. 4.1) as well as particularly at the 20 cave site locations of a large number of high-resolution speleothems from the SISALv2 database (Comas-Bru et al., 2020b).
Through spatial testing of climate variables, we analyze the relationship between the measured stable isotopes of oxygen and carbon and different modelled climate variables (Sect. 4.2). In a second step, we investigate if models can reproduce the variability as measured in speleothems on annual to centennial timescales (Sect. 4.3). Finally, we test what timescales of events, e.g. volcanic eruptions or variations in the solar irradiance, speleothems archives are able to resolve through either 25 isotope (Sect. 4.4).

Data
In this study, we collected and standardized the output from five different isotope-enabled model simulations over the last millennium, as well as oxygen and carbon isotopes in speleothems from the SISALv2 database.
2.1 Isotope-enabled general circulation models 30 A major advantage given by modelling SWI is its ability to both temporally and spatially resolve the variability of isotopes in precipitation by adding H 18 2 O and HDO to the part which already simulates and traces the most abundant water isotope, H 16 2 O. Simulated δ 18 O will further be denoted as δ 18 O sim . In the atmospheric advection scheme of the model, which is generally a part of the model's dynamical core, all three water isotopologues behave identically. Whenever there is a phase change (such as melting, condensing, evaporating, and freezing), additional fractionation effects are applied to the two less abundant 35 water isotopologues. These phase changes typically occur in evaporation from land and ocean surfaces, condensation during formation of clouds, rain, snow, and re-evaporation of precipitation below cloud (for example, Werner, 2010;Sturm et al., 2010).
The models used in this study range from AGCMs forced with SST and sea-ice distribution to AOGCMs. Their basic characteristics and boundary conditions are listed in Tab. 1. They are both used individually in the analysis, as well as by the 40 4 https://doi.org /10.5194/cp-2021-152 Preprint.  (green) and isoGSM (orange)) and external forcings over the last millennium: a) global mean surface temperature anomaly as represented by the model (in model colors), as well as the reconstructed temperature anomaly (PAGES1k, red, PAGES2k-Consortium (2019)), and observed temperatures (HadCRUT4, black Morice et al. (2012)). b) isotopic composition of precipitation in the different models (in model colors) at the cave site location of Bunker cave (Germany) including the δ 18 Ospeleo of entity ID 240 (red), all at the temporal resolution of entity ID 240 (Comas-Bru et al., 2020b;Fohlmeister et al., 2012). c) Atmospheric CO2 concentration (SMT: Schmidt et al. (2012), MFM: MacFarling Meure et al. (2006)), d) volcanic forcing in units of aerosol optical depth (AOD) (CRO: Crowley et al. (2008); Crowley and Unterman (2013), GAO: Gao et al. (2008)), where the AOD for the Gao et al. (2008) reconstruction was estimated by dividing the sulfate loading by 150 Tg (following Atwood et al., 2016), e) total solar irradiance (TSI) (STH: Steinhilber et al. (2009), MSL: Muscheler et al. (2016, VR: Vieira, L. E. A. et al. (2011)). The grey bars mark particular periods of high volcanic forcing. ensemble mean of all models. Fig. 1 shows the climate as represented by the different models and external forcings used in the simulations.

ECHAM5/MPI-OM
We use the isotope-enabled version of the fully coupled Earth System Model ECHAM5/MPI-OM (hereafter, ECHAM5wiso) (Werner et al., 2016;Jungclaus et al., 2006). The model consists of the atmospheric model ECHAM5 (Roeckner  Schmidt et al. (2012) land surface model assumes no fractionation in most of the physical processes (Haese et al., 2013). Water tracers are fully mixed and advected in the ocean model, and its total mass is conserved (Werner et al., 2016).
ECHAM5-wiso has been used extensively within the paleoclimate field, as well as for present time (for example , Werner et al., 2016;Langebroek et al., 2011;Goursaud et al., 2018). The fully coupled version of the model ECHAM5/MPI-OM ESM has very good agreement with both present-day isotope observations from the GNIP database, as well as with ice core 5 and speleothem proxies during mid-Holocene (MH, Comas-Bru et al., 2019), last glacial maximum (LGM Werner et al., 2016;Comas-Bru et al., 2019), and for last interglacial (Parker et al., 2021). Both in the ESM and with the atmospheric component (ECHAM5-wiso), a warm bias in the model is found over high-latitudinal regions, especially over Greenland and Antarctica (Werner et al., 2011(Werner et al., , 2016 and has been attributed to the coarse spatial resolution in the atmospheric component of the model (Werner et al., 2016) resulting in an underestimation of isotope depletion in these regions. The last millennium

GISS ModelE2-R
The isotope-enabled AOGCM GISS ModelE2-R (hereafter, GISS-E2-R) (Schmidt et al., 2006(Schmidt et al., , 2014 is used with the same physics as in the CMIP5 experiments (Miller et al., 2014;Schmidt et al., 2014). Water tracers and isotopes are incorporated into the atmosphere, land surface, ocean, and sea-ice components of the model (Schmidt et al., 2005). Several experiments have been set up for the last millennium with GISS-E2-R, due to uncertainties in past forcings and its effects, with different 5 combinations of solar, volcanic, and land-use/vegetation forcings but all with the same greenhouse gas and orbital change (Colose et al., 2016a, b;Lewis and Legrande, 2015).
GISS-E2-R has been shown to simulate modern isotopic observations well, except over Antarctica, in terms of changes in convection, clouds, and isotope kinetics (Schmidt et al., 2005). For the last millennium, GISS-E2-R has also explored the isotopic responses to volcanic eruptions in South America (Colose et al., 2016a), volcanic forcing in relation to the position 10 of the intertropical convergence zone (Colose et al., 2016b), and to ENSO (Lewis and Legrande, 2015). The model has been shown to have a warm SST bias and issues in sea ice concentration around Antarctica, related to the transport of the Antarctic Circumpolar Current. In the tropics, a warm bias is also found over land, together with cooler northern midlatitudes (Schmidt et al., 2014). 15 We use the last millennium run of the isotope-enabled iCESM1 version 1. The isotope-enabled version captures general global isotopic signatures well over ocean areas but shows small discrep- 25 ancies across the land surface . This effect has been explained by the model showing a slight negative isotopic bias due to overestimated modelled convection in mid-latitude oceans. Consequently, the transport of SWI-mass poleward and landward has been deemed insufficient . Footprints associated with major climatic modes such as ENSO and PDO are found to be well represented also in isotopic signatures (Midhun et al., 2021). 30 We use the last millennium run from the fully coupled atmosphere-ocean isotope-enabled GCM iHadCM3 (Bühler et al., 2021). iHadCM3 has been widely used to simulate present (Dalaiden et al., 2020) and future climate (Sime et al., 2008;Tindall et al., 2009;IPCC, 2013), as well as for past climates (Tindall et al., 2010;Sime et al., 2013;Holloway et al., 2018).

iHadCM3
The model consists of several components: the atmosphere model HadAM3 (Pope et al., 2000), the ocean model HadOM3 (Gordon et al., 2000), a sea ice model (as described in Valdes et al., 2017) and a dynamic land surface and vegetation model 35 (Cox, 2001).
For the isotope-enabled version, SWI were added as two separate water species in the atmospheric model, and as tracers in the ocean model. Fixed isotope fractions are added to a fixed volume gridbox of the ocean and experience changes due to evaporation, precipitation, and runoff through a virtual isotope flux, altering the δ 18 O sim ratio in the top level of the ocean accordingly (Tindall et al., 2009). Compared to instrumental observations, the model represents sea surface temperature, ice sheet, and ocean heat content well (Gordon et al., 2000). The freshwater hydrological cycle in the model shows only a slight overestimation in the local evaporation (Pardaens et al., 2003). The model simulates the major isotopic fractionation effects defined by Dansgaard (1964) (e.g. the latitude effect, the amount effect, and the continental effect) appropriately compared to GNIP data (Zhang et al., 2012). Additionally, a broad agreement in isotopic output with GNIP data in the general spatial distribution can be observed 5 (Tindall et al., 2009).

isoGSM
IsoGSM is the isotope-enabled version of the Scripps Experimental Climate Prediction Center's (ECPC) GSM (hereafter, isoGSM) (Yoshimura et al., 2008). The model is based on the previous medium-range forecast model used at NCEP, making it well documented in its performance as an operational weather forecast model (Kanamitsu et al., 2002;Caplan et al., 1997). 10 The IsoGSM is a stand-alone atmospheric model. Here, it has been forced with SST and sea-ice distributions from CCSM4 last millennium simulation (Landrum et al., 2013). Land surface processes are modelled through NOAH model, but isotopic fractionation is not considered in these processes (Yoshimura et al., 2008).
isoGSM has been shown to represent isotope and precipitation observations globally using a spectral nudging technique captured by the NCEP/DOE Reanalysis dataset (Yoshimura et al., 2008). Its last millennium simulation has not been eval- 15 uated in previous studies, but isoGSM captures large-scale isotope and climate patterns in present times compared to other models with implemented isotopes (Risi et al., 2012b). The model has shown to also reproduce observed isotopic and precipitation variability well over the regions of Indian Summer Monsoon (Berkelhammer et al., 2012a), western North America (Berkelhammer et al., 2012b), and NW Scotland . Recently, IsoGSM showed good consistency with speleothem oxygen isotopes from East Asia (Chiang et al., 2020) and from South Asia (Kathayat et al., 2021). 20 isoGSM tends to underestimate the depletion of δ 18 O sim in dry regions such as the continent of Antarctica (Yoshimura et al., 2008). A decreasing summer temperature increases the precipitation δ 18 O sim , and is caused by the moisture transport scheme of the model associated with areas of extremely dry conditions (Yoshimura et al., 2008;Risi et al., 2012b). Figure 2. Site locations of the SISALv2 database on a global karst map (brown shading Williams and Ford (2006). We only consider entities that cover a minimum of 600 year period within the last millennium and that include at least 2 dates and 36 isotopic measurements in this period. Shown are all sites within the last millennium meeting these selection criteria that have both oxygen and carbon isotope measurements (red), and only oxygen isotopes measurements (blue).

SISALv2 database
In this study, we use speleothem data from the Speleothem Isotope Synthesis and Analysis version 2 (SISALv2) database (Comas-Bru et al., 2020a, b). The database includes 691 speleothem records from 294 caves across the globe, from all continents except Antarctica.
The last millennium has abundant records globally with sufficient resolution and reasonable dating uncertainties (Bühler et al., 2021). We filtered the database for records that cover at least a 600 yr period within the last millennium (850-1850CE), 5 exhibit at least two dates within the time period, as well as 36 stable isotope measurements, to guarantee a minimum resolution of 30 yr. We obtain 89 records from 75 different sites for δ 18 O speleo of which 58 (65 %) from 50 sites also have δ 13 C speleo measurements (Fig. 2).

Methods
To compare climate simulation outputs and speleothem data, both need some preparation. Modelled output comes in regular 10 monthly resolution, while time series of speleothem proxies are irregularly sampled. Different speleothem mineralogies, as well as different isotopic standards between modelled and recorded data, need to be accounted for before statistical similarity measures are applied.

Speleothem drip water conversion
The database includes calcite, aragonite, and mixed mineralogy speleothem records. Following Comas-Bru et al. (2020b), 15 we only use pure calcite or aragonite speleothems. To be able to compare precipitation δ 18 O sim values to those measured in calcite or aragonite, the δ 18 O speleo is converted to its drip water equivalent (δ 18 O dweq ) relative to the V-SMOW standard as in Comas-Bru et al. (2019). This conversion is temperature dependent but to different extents for both minerals. Tremaine et al. (2011) provide an empirically based fractionation formula for speleothems of calcite mineralogy: 20 where T is the temperature in K and δ 18 O are given in units of ‰.
Aragonite speleothems form under different conditions, e.g. higher Mg content of the dripping solution or very slow driprate , resulting in a different fractionation factor compared to calcite as described by Grossman and Ku (1986): 25 For both conversions, the cave temperature at the time of the fractionation is needed. As these are often not available, especially in a palaeoclimate setting, we use annual mean modelled surface temperatures as a surrogate for cave temperatures . For caves in very cold conditions, the annual mean surface temperature may underestimate the mean cave temperature by some degrees due to long-lasting snow-packs. This underestimation, however, only corresponds to an overestimation of 1 ‰ in δ 18 O dweq and is within the range of the simulation ensemble. Additionally, cave-dependent time 30 lags between the surface and the cave temperature are not accounted for, as they have a negligible effect on the time-averaged mean isotopic value. The conversion is done for each entity and each simulation individually, where we use the simulated annual mean surface temperature, down-sampled to the record's resolution.
In a last step, the V-PDB values are converted to V-SMOW, using the conversion from Coplen et al. (1983): we convert the aragonite δ 13 C values to corresponding calcite values using a fractionation offset of 1.9 ± 0.3 ‰ (δ 13 C c = δ 13 C calcite = δ 13 C arag − 1.9 ± 0.3 ‰). This offset accounts for the different enrichment-factors of the two polymorphs as established in laboratory studies (Romanek et al., 1992) and confirmed in a speleothem study .
From the drip water conversion, we obtain a matrix for each of the isotopes with one row per measurement and six 5 columns, where one represents the observations and the other five the modelled estimates.

Data processing
Although all simulation output from the five different climate models is available at monthly resolution, the time coverage differs. All simulation runs are cut to cover the time period from 850CE to 1850CE, an interval that is similar to PMIP's interval in the last millennium experiment. All simulation runs provide different sets of post-processed output variables. We 10 focus on surface temperature, precipitation, precipitation-δ 18 O sim , and evaporation. For the simulations lacking evaporation as a diagnostic (iCESM, isoGSM, and iHadCM3), we convert latent heat to potential evaporation and use these variables within the simulations. Outliers in the simulation are removed by comparing each modelled value in the 3D-output data matrix with its eight neighbors in time and space. If the value deviates from the mean of these eight values by more than five standard deviations, the value is set to NA. On average 0.001% of values are set to NA through this method. 15 Monthly mean δ 18 O sim are used with weighting by precipitation minus evaporation amount (infiltration adjusted precipitation weighting, iw) to obtain annual values. The simulated isotope mean is calculated as: where δ 18 O iw is the monthly weighted annual annually weighted composition of isotopes, δ 18 O k refers to monthly simulated δ 18 O, and iw k is the corresponding monthly amount of iw. As isotopic fractionation occurs during evaporation from 20 the soil, models, where δ 18 O sim is also available for soil layers, would be more realistic to compare to speleothem data.
However, these were only available for a few simulations. Using infiltration-weighted δ 18 O sim , therefore, offered a more equal handling of the data and enabled a better comparison of results.
Where simulation data are compared on a gridbox-level, we block-average all simulations to the same spatial resolution as that of the ECHAM5-wiso-run, which has the lowest spatial resolution. Data at the speleothem cave sites are extracted via 25 bi-linear interpolation as in Bühler et al. (2021). Here, a gridbox of the size of the simulation's original resolution with the cave's location at its center is resampled from the original gridboxes that it overlaps with.
The simulated monthly temperature, precipitation, and evaporation at the speleothem cave locations are averaged to annual mean values. All speleothems in our last millennium subset, however, come as irregularly sampled time series with a median resolution of 5.19 yr per entity, 90% CI: (4.13, 6.99). Considering only the speleothems with measurements of 30 both isotopes yields a median resolution of 6.08 yr (4.07, 7.85). All simulated variables of all models are block-averaged to the irregular temporal resolution of the individual speleothem. We also include speleothems in our analysis where no δ 13 C measurements but only δ 18 O are available. In direct comparisons between carbon and oxygen isotopes, we only consider those 58 speleothems that provide samples for both isotopes.
The relationship between δ 18 O speleo and simulated climate variables are determined following three different latitude 35 bands to guarantee enough data points within each zone, the tropics, the subtropics (poleward of the subtropical jet stream), and the extratropics (Holden, 2012 Correlation estimates and p-values for regular time series i.e. the annual resolution output of the simulation, are calculated via the Pearson's product moment correlation (via the function cor.test (R Core Team, 2020b)). We use a significance level of α = 0.1.

10
Correlation estimates for irregular time series are calculated via Pearson correlation as adapted by Rehfeld and Kurths (2014) and tested for last millennium speleothem records in Bühler et al. (2021). Here, we also choose a significance level of α = 0.1. This level is appropriate for both the regular and irregular time series, considering the number of samples N compared to the strictness and expected level of false positives. Whenever calculating correlation estimates where speleothem data is involved, we use the raw δ 18 O speleo or δ 13 C speleo instead of the drip-water converted values to decrease any potential 15 biases.
With all simulated variables down-sampled to the irregular resolution of each speleothem record, the use of power spectral analysis of the time series can describe the variation of common signals on a frequency spectrum of all time series (Chatfield, 2003). The power spectral density (PSD) of a time series describes the power distribution versus frequency over a finite 20 interval of time (Chatfield, 2003).
To compute spectra of irregularly sampled time series, these are first interpolated to their mean resolution by a double interpolation and filtering process (following Laepple and Huybers, 2014a;Rehfeld et al., 2018;Dolman et al., 2020). This interpolation is performed to reduce high-frequency artifacts. The robustness of this spectral estimation process was recently confirmed by Hébert et al. (2021). 25

Synchronous events in speleothem isotopes
An alternative similarity measure to correlation estimates, especially given the irregularity of the available time series, is checking for synchronous events within two time series. After distinguishing extreme events, strength and direction of synchronous extreme events are only based on their relative timing (Rehfeld and Kurths, 2014). In this study we focus only on this relative timing. 30 Within the modelled or measured isotope time series, we distinguish the 5% extreme values as the values above/below the 97.5% / 2.5% quantile of the time series distribution. Two extreme values occurring in a row are treated as two separate extreme events. Extreme values for time series of solar irradiance are determined in the same manner. For the volcanic forcing time series, we distinguish those events above the 95% quantile of the distribution.
Two events are considered synchronous, if they both occur within a time period around the events, limited by a local 35 threshold τ . This local threshold is calculated for each possible pair of extreme events and is chosen as half the minimum time between either extreme event and its preceding or succeeding extreme. The median τ is 4.62 yr (90% CI: 4.37, 5.28), which is of the same order of magnitude as the median resolution of all records with both carbon and oxygen measurements of 6.08 yr, (4.07, 7.85). We set a hard threshold limit of 50 yr, corresponding to the median age uncertainty considering the original chronologies as well as the SISALv2 ensemble chronologies (Comas-Bru et al., 2020b). When comparing synchronous For comparable extreme event signatures between the modelled and measured isotopes to volcanic or solar forcings, each modelled time series is checked for synchronicity against their respective forcing time series. The analysis is repeated for the speleothems for the same number of forcings and averaged.

5
When looking at the temporal distribution of global synchronous extreme events, they are sorted into bins of 10 yr, which is approximately twice the median local τ . If a pair shows several extreme events within one bin, it is only counted once. We determine significance by randomly permuting one of the time series of a pair and repeating the analysis 2000 times. Within one bin, all counts that are larger than the 95% quantile of this 'mean background noise' can be regarded as significant. We first compare the mean δ 18 O iw signal of the five different last millennium simulations, to see potential model biases and large differences between the simulations (Fig. 3). This general offset between the global mean δ 18 O iw is also visible when comparing the spread of mean values on a gridbox-level (Fig. 3f), where isotopic signatures differ the strongest in the Antarctic. Restricting the view to low-to midlatitudes, the largest model data difference is in the area of the Sahara desert, the Arabian peninsula, and the Indian peninsula, where in particular the iHadCM3 simulation deviates strongly from the other four. This deviation could be due to the higher than multi-model mean temperature in these areas in the iHadCM3 simulation (SFig. 3). Areas of high precipitation dif-  Analyzing the offsets between simulations and speleothems can indicate how well the model data matches the proxy signal.

Results
Here we investigate the offset in δ 18 O between simulated last millennium mean (δ 18 O iw ) and speleothems (∆δ 18 O= δ 18 O sim δ 18 O dweq ) on a global scale. The general distribution and offsets between each model and speleothem data are shown as 15 kernel density estimates (Fig. 4). The speleothem dataset has a median general distribution of -6.21‰ globally. Of the simulations, ECHAM5-wiso has the closest distribution median with -6.82‰, followed by isoGSM (-7.72‰), iHadCM3 (-8.20‰), GISS-E2-R (-8.25‰), and iCESM (-9.79‰) (Fig. 4a). The offset distributions between simulations and speleothem   The general patterns of the isotope climatology are similar between the models (Fig. 3). Larger differences in the modelled isotopic signatures appear particularly in regions where modelled temperature spreads widely as well. On average, the offsets to the speleothem records (Fig. 4) appear small and are consistent with the general difference to precipitation δ 18 O sim . Differences between δ 18 O sim signatures between the models may arise from different simulation drivers for the oxygen isotopes, e.g temperature and precipitation, and can hint at different processes that govern the isotopic water cycle at a certain 10 region within the simulations. The mean of the correlation to these main climatic drivers to δ 18 O sim shows high agreement between the simulations (Fig. 5, individual correlation fields in SFig. 5). For the correlation to temperature (Fig. 5a), two main domains can be distinguished: there is mainly a positive correlation to temperature in the mid-to high-latitudes and on the continents, and negative correlation in the low-latitude ocean. Large-scale agreement between the simulations is, however, limited to the higher latitudes and the tropical ocean. 15 Two domains are also apparent in the correlation to precipitation (Fig. 5b), which are even more clearly separated than for temperature. We find areas of negative correlation to δ 18 O sim in the low to mid-latitudes and areas of positive correlation only in the very high latitudes. The agreement between the simulations, indicated by the crosses, is higher in the low-to mid-latitudes.
The inter-model comparison shows more agreement in the correlation fields to temperature than to precipitation, when 20 focusing only on cave locations: the sign of correlation between δ 18 O sim and simulated temperature agree for three and more simulations at 60% of locations, and for four and more simulations even at 26% of locations. For precipitation on the other hand, only 11 % of locations agree in sign for three and more simulations, while it is only 1.1 % with agreement in four or more simulations.
The model-data comparison shows more significant temporal correlation estimates between simulated temperature to 25 δ 18 O speleo and also more sign agreement between these correlation estimates than to simulated precipitation. For precipita- The data suggests that two main drivers for δ 18 O can be distinguished in specific regions -temperature is dominant in the high latitudes, while precipitation appears to be the main driver in the low latitudes. δ 18 O in precipitation shows global signatures depending on latitude or altitude amongst other variables (Dansgaard, 1964).
We assess this by looking at the relationships between speleothem δ 18 O dweq and δ 13 C c with the site-specific variables of latitude and altitude.
In Fig. 6a) we see a decrease in δ 18 O dweq as more northern speleothems are considered (globally: R 2 = 0.22, p < 0.00, 10 STab. 1). The large spread in the mid-latitude Northern Hemisphere is mostly due to the high number of speleothem records available in both Europe and China, implying a high longitudinal spread within the database. Fig. 6b) shows a global negative relationship of δ 18 O dweq to altitude, even though records get scarcer as the altitude increases. No clear pattern is visible for δ 13 C c , and the dataset has a large mean spread (Fig. 6c). A statistically significant relationship between altitude and δ 13 C c cannot be established. However, the spread in δ 13 C c appears to decrease with increasing altitude (Fig. 6d) 15 We further explore the simulated meteorological variables and investigate spatial relationships between speleothem mean δ 18 O speleo and model variable ensemble mean (Fig. 7, STab. 1). We find a significant relationship between δ 18 O iw and speleothem δ 18 O dweq across all latitude bands (Fig. 7 a-c), with the strongest correlation in the extratropics. Furthermore, we find a global dependency of δ 18 O dweq to mean simulated temperature and precipitation at the cave sites. For both temperature and precipitation, we find the strongest relationships to δ 18 O dweq in the subtropical regions. In all three regions, 20 the relationship to temperature always exceeds that of precipitation. In Fig. 7 (d-f) cave site altitude information is applied by color codes, showing that higher altitude coincides with more negative δ 18 O dweq in the tropics (R 2 = 0.45, p < 0.01) and the extratropics (R 2 = 0.50, p < 0.01). A higher altitude also coincides with lower temperature in the same latitudinal  bands. For the subtropics, the cave site altitudes have less range, and an altitude pattern cannot be distinguished. A significant relationship to annual evaporation can only be distinguished in the extratropical regions.
We further compare the same meteorological variables to the speleothem δ 13 C c data (Fig. 8, STab. 1). A significant relationship is only found with temperature in the extratropical region (Fig. 8c), with increasing temperatures corresponding to more depleted δ 13 C c . δ 13 C c is also found to be enriched with altitude (R 2 = 0.23, p < 0.02, results not shown) in the 5 extratropics. This δ 13 C-altitude relationship is not found in the other latitudinal bands.
The spatial testing shows globally strong relationships between δ 18 O dweq to environmental factors, in particular to altitude, temperature, precipitation, and evaporation. The spatial relationships between speleothem entity mean δ 13 C c and

Variability on different time scales
We compare the variance distribution in oxygen and carbon isotopes over all speleothems. This is a useful measure of how climatic and environmental factors influence the proxies to a different extent. Additionally, different simulations have very 5 different representations of variability across different timescales. This behaviour can be explored by calculating power spectral densities (PSD) of the simulated and recorded isotopes averaged globally. Fig. 9a) provides the spectral ratio of the two isotopes after detrending the irregular time series. A flat spectral ratio at ∼ 1 would indicate same levels of variability for both isotopes on all timescales. The spectral ratio here shows higher variability of δ 13 C speleo on all timescales, however, for periods smaller than 10 years, the variability of both isotopes is more similar. This 10 is supported by the total variance of the isotopes over the complete period. δ 13 C speleo shows a much higher total variance with a median of 0.46‰ 2 (0.38, 0.6) compared to δ 18 O speleo with a median variance of 0.11‰ 2 (0.08, 0.12).  variability in δ 18 O speleo compared to all models on decadal and shorter timescales although to different extents (Fig.9b).
When considering the simulations down-sampled to record resolution, the result is similar but there is much lower variability in δ 18 O speleo at decadal and shorter timescales (Fig.9c). By down-sampling, the simulated spectra lose power in frequency on the decadal and shorter timescales, which is then reflected in higher spectral ratios. On decadal to centennial timescales, however, δ 18 O speleo shows much higher variability than the modelled δ 18 O sim , unaffected by the down-sampling. 5 Variability of δ 18 O iw is modelled differently in the simulations, as represented by the different levels of spectral power in the ratios. This difference is supported by the 5 area-weighted global variances of different magnitude, as well as the simulated variance in annual and down-sampled resolution as listed in Tab. 2. Small deviations between calculated variance and variance as area under the PSD can arise from the interpolation before the calculation of the spectra. The general order of isoGSM having the highest and iHadCM3 the lowest power on shorter frequencies remains throughout, as can also be seen 10 in the table with small deviations to the order. The global variance, however, is only partly represented at the cave locations.
While isoGSM shows the highest variance globally and at cave locations, iCESM is of medium variance globally but has the smallest variance at cave locations. For unweighted isotopic composition, the order of simulations changes (results not shown).
The analysis suggests that variability in the simulated δ 18 O iw is represented differently in the simulations and that the 15 order is not frequency-dependent. The recorded δ 18 O speleo shows more variability on centennial and less variability on decadal and smaller frequencies than the simulated, although to a different extent depending on the simulation.

Analysis of extreme events
To examine if there are factors that influence both δ 18 O speleo and δ 13 C speleo simultaneously, we analyze the similarity of   Figure 10. Synchronous events: a) the synchronous extreme events between δ 18 Ospeleo and δ 13 Cspeleo (red), b) the synchronous events between the speleothem isotopes (oxygen purple and carbon green), and volcanic eruptions as reconstructed by Crowley and Unterman (2013) or Gao et al. (2008) (depicted in Fig. 1d)), c) and d) the synchronous extreme events between simulated δ 18 O values at the cave locations of all simulations in down-sampled to record resolution and annual resolution respectively. Where occurrence of synchronous extreme events is significant with α = 0.05, the bars are shown in dark colors, non-significant in transparent colors. The four light grey bars in the background of each plot show areas of high volcanic activity. Fig. 10 shows the temporal distribution of extreme events globally. In Fig. 10a) we test for synchronous extreme events between the two isotopes. Despite the high number of significantly correlated oxygen and carbon isotopes within one record, global patterns are not visible with a maximum of 10% of speleothems showing synchronous extreme events between δ 18 O speleo and δ 13 C speleo at the same time over the last millennium. Synchronous events are also not higher in time periods of strong volcanic eruptions (indicated by grey bars). 5 By analyzing synchronous events between volcanic eruptions as reconstructed by Unterman (2013) andGao et al. (2008) (Fig. 1d) and δ 18 O speleo as in Fig. 10b), up to 20% of speleothems exhibit extreme events at the same time as extreme volcanic eruptions. This share is higher than for the carbon isotopes, where up to 15% exhibit extreme events synchronous to volcanic eruptions. Both isotopes also show pronounced peaks occurring with different extreme volcano events (indicated by grey background) but also for minor volcanic events. 10 To check if the speleothems used here can resolve global extreme events of short duration, we compare δ 18 O sim at the cave locations of each simulation to the volcanic forcing of the simulations (see Tab. 1). While up to 50% of δ 18 O sim at cave locations and annual resolution exhibit an extreme event at the same time as an extreme volcanic eruption (Fig. 10d), this 19 https://doi.org /10.5194/cp-2021-152 Preprint. Discussion started: 24 November 2021 c Author(s) 2021. CC BY 4.0 License. number largely decreases to less than half when the resolution of δ 18 O sim is decreased to that of the speleothems (10c). The number of pseudo-speleothems experiencing synchronous events to volcanic forcing (Fig. 10c) is more similar to that of the speleothems (Fig. 10b). Summarizing, we see no global temporal pattern of synchronous extreme events in both δ 18 O speleo and δ 13 C speleo (Fig. 10a). A lower temporal resolution strongly decreases the ability of the modelled archive to resolve global events like volcanic eruptions. We found that the mean δ 18 O sim fields show global differences of 2.12 ‰ between the models, that could mostly be attributed to the global mean temperature differences 1.8 K between the models. Similarly, most of the strong regional differences in 10 δ 18 O sim between models could be explained by regional differences in simulated temperature (SFig. 3), as temperature was shown to be a major driver of δ 18 O sim (Fig. 5a). Specifically, we found less depletion of δ 18 O sim in isoGSM over Antarctica as an artifact of its numerical scheme used for moisture transport, and linked to extremely dry regions (Yoshimura et al., 2008). The warm bias in high latitudes for ECHAM5-wiso results is an underestimation of isotope depletion (Werner et al., 2011(Werner et al., , 2016. Overestimation of fractionation processes in iCESM during re-evaporation processes resulted in generally 15 stronger depletion in δ 18 O sim . The cool bias in northern mid-latitudes in the GISS-E2-R model as found in Schmidt et al. (2014) resulted in more depleted δ 18 O sim in this region. Colder temperatures over Antarctica in iHadCM3 explained partly why isotopic signatures are a lot more depleted than in the other simulations. Compared to historical ice core data, iHadCM3 mean isotopic signatures above Antarctica indicated realistic values (Tindall et al., 2009) suggesting that the colder Antarctic conditions modelled by iHadCM3 may be more consistent with reality than the multi-model mean. 20 Even though this study mostly focused on a terrestrial mid-to-low latitude archive, local differences of modelled isotopic signatures in the Antarctic may have an influence on isotopic representation in the general circulation of the models.
At the cave locations, the spread between the simulations yielded 4.51 ‰ (90% CI: 3.96, 4.79) in median (Fig. 3), while the median offset between simulated and speleothem δ 18 O dweq was around −0.38 ‰ (−0.8, −0.23) (Fig. 4. This means that even though simulations differed strongly in some regions, using multiple models can be sufficient to average out the offset 25 at cave locations. The offsets to the speleothems were in agreement with those found by Bühler et al. (2021) who compared the SISALv2 database to the iHadCM3 last millennium simulation. Median offsets to the speleothem records were small for all models, where differences spatially around the globe (Fig. 3) reflected both the internal variability of models, as well as global differences between the models. For example, ECHAM5-wiso showed the highest values for δ 18 O sim in the global mean and also a more positive offset distribution than the other models. 30 Our analysis suggests that a multi-model approach is advisable whenever comparing mean modelled values to data. Even though global mean δ 18 O sim values may be comparable, local and regional temperature estimates, and, therefore, modelled δ 18 O sim values can vary strongly and deviate between models. Even though isoGSM displayed the lowest offsets between δ 18 O sim and δ 18 O speleo (Fig. 4), additional processes between meteoric water above the cave and drip water may again influence this mean offset. Still, the multi-model offset comparison justified the use of the multi-model mean at cave locations 35 in the following spatial analysis.
We found significant relationships between all considered climatic variables (temperature, precipitation, and evaporation), and simulated δ 18 O sim and mean δ 18 O dweq (Fig. 7). These relationships were more distinct over latitude bands, which is in line with the effects described by Dansgaard (1964). However, the relationships between speleothem mean δ 13 C c and meteorological variables from model ensemble mean (Fig 8) were less clear. Global studies that have evaluated δ 18 O speleo isotopic signatures using climate models already exist (Bühler et al., 2021;Comas-Bru et al., 2019;Midhun et al., 2021), where regions with shared climatic features showed stronger relationships. For example, Baker et al. (2019) focused on specific climate zones by comparing drip-water measurements to precipitation δ 18 O measurements and was able to identify temperature zones for which mean measured δ 18 O or δ 18 O iw was most similar to δ 18 O speleo . In our study, we found stronger relationships to climate variables in the latitude bands than compared to a global 5 assessment as in Bühler et al. (2021). Analyzing regions with high data density and similar climate patterns, we found even stronger temperature relationships (SFig 7 and SFig. 8). Still, local particularities, such as large elevation difference over short distances, could not be resolved properly by the simulations and explained many of the very strong outliers, especially in the tropics.
Comparing a last century subset of the SISALv1 database by Fohlmeister et al. (2020) with our last millennium SISALv2-10 subset yielded similar results for precipitation and altitude relationships to δ 13 C speleo (SFig. 6). In contrast to our latitudinal approach, they compared their δ 13 C speleo dataset to temperature on a global scale. However, they neglected clusters of speleothems, where known carbon isotope governing factors other than temperature play an important role for the carbon isotope composition (e.g. high amounts of precipitation, known cave specific particularities and processes, or temperatures close to the natural limit of vegetation (< 5 • C)). The remaining records with temperatures between ∼ 7 to 27 • C showed 15 a positive trend between δ 13 C speleo and temperature. This trend is in contrast to our observation based on clustering the records according to latitudinal bands. With this approach, we find no relationship between δ 13 C speleo and temperature for the tropics and subtropics, but a clear inverse relationship is observed for the extratropical records.
Higher cave site elevation coincided significantly with more depleted δ 18 O dweq . For δ 13 C speleo , local studies exist that predict an increase in δ 13 C speleo with higher altitude (Johnston et al., 2013). However, for the global last millennium subset 20 of the SISALv2 database more entities with carbon measurements in higher altitudes are needed to see a potential global relationship, in addition to the relationship we find in the extratropical latitude bands.

Can models reproduce variability archived in speleothems?
For all simulations, temperature variability was the dominant driver in δ 18 O sim at high latitudes and precipitation variability at low latitudes ( Fig. 5). At the cave sites, model-internal regional variability as well as the records' age uncertainties 25 substantially decreased correlation estimates. We observed that the sign of the correlation estimated between simulated temperature and δ 18 O sim agreed at 60% of cave locations for 3 and more simulations, and at 26% for 4 and more simulations.
For correlation estimates to precipitation, this was true only at 11% or 1% of locations, respectively. When compared to measured δ 18 O speleo , we found more significant temporal correlation estimates to modelled temperature than to modelled precipitation. This could in part be explained by global temperature responses to e.g. volcanic forcing being more uniform 30 between model ensemble runs compared to precipitation responses, which depend strongly on regional particularities. Regions with high inter-model climate variable spread (as can be seen in SFig. 2) also coincided with regions of the least significant correlation estimates to simulated temperature and the least agreement between the climatic drivers of δ 18 O sim .
When looking at variability specifically at the cave site locations, we saw that for all speleothems where both δ 18 O speleo and δ 13 C speleo are available, δ 13 C speleo appeared to be more variable on average on all timescales (Fig 9d) with an in-35 creasingly higher variability compared to δ 18 O speleo towards centennial timescales and longer (Fig 9a). Within 86% of all speleothems, where δ 18 O speleo and δ 13 C speleo are provided, carbon and oxygen showed significant correlations. Jointly explained variance in the isotopic signal could point to common climatic drivers. However, the amplified variance on long timescales in the carbon isotope ratio could not only hint at changes in the water cycle but also land surface processes such as soil formation or vegetation changes. Considering more terrestrial archives as well as trace elements stored within the 40 speleothem may help to better disentangle the climatic and environmental signals. On decadal and shorter timescales, where 21 https://doi.org /10.5194/cp-2021-152 Preprint. Discussion started: 24 November 2021 c Author(s) 2021. CC BY 4.0 License. the meteoric and seasonal vegetation isotopic signal is mostly smoothed by the karst system, higher variability in δ 13 C speleo may result from the stronger isotopic fractionation for carbon compared to oxygen in precipitated calcite (Polag et al., 2010;Hansen et al., 2019).
Climate models reflected δ 18 O sim variability at cave locations to different extents. A clear offset between the models could be found on all timescales. We found no relationship between spatial resolution of the model and the variability of isotopic 5 composition of precipitation. The higher-resolution run of iCESM and the lower-resolution run of iHadCM3 seem to show similar variability globally and the lower-resolution ECHAM5-wiso shows even higher variability at cave locations. Due to the strong impact of temperature and precipitation on δ 18 O sim variability we expect that this difference in isotopic variability also stems from the difference in the simulated climate. Further assessments using multivariate statistics are needed to firmly attribute the impact of climate on recorded isotopic variability. 10 The lower temporal resolution of speleothem records largely explained the model-data mismatch on decadal and shorter timescales. Slow growth rates and limited sampling resolution lead to averaging effects which then lead to lower variability on shorter timescales. Simple karst-filters of a realistic transit time of ∼ 2.5 yr as in SFig. 9b) (as used in Bühler et al., 2021;Midhun et al., 2021;Dee et al., 2015) showed that variations in models and speleothems on these shorter timescales are very similar, if accounted for. Expert knowledge of the local cave hydrology is, however, needed for a more detailed assessment 15 on which model reflects δ 18 O sim variability best on decadal and shorter time scales compared to speleothems and may still be restricted by karst and cave internal processes that effectively limit the sampling of climatic signals.
On decadal and longer timescales models seemed to underestimate δ 18 O speleo variability, although to different extents, with isoGSM showing the highest variability and the smallest model-data mismatch on decadal and longer timescales. The model-data mismatch that we observed between speleothems and δ 18 O sim starting at decadal timescales is in agreement with 20 previous studies as by Laepple and Huybers (2014b), however, it is worth mentioning that speleothems may also be capable of enhancing climate-driven changes of δ 18 O and δ 13 C by cave-specific processes, resulting in higher variability on decadal and longer timescales. However, this has to be verified by future studies. Under the assumption that variability on decadal and longer timescales is recorded correctly by speleothems, the iCESM model showed the strongest variability mismatch and isoGSM the smallest. A model-data match on decadal and shorter timescale depends strongly on cave-hydrology processes 25 that additionally dampen the meteorological signal.
Cave locations were not necessarily reflective of mean annual δ 18 O variability globally -at least not in the model simulations. Simulations that showed generally high variability at cave locations at speleothem resolution also tended to be more variable globally and vice versa for low variability. However, simulations that were less variable at cave locations than others can still be more variable globally. In our case, this trend could likely be attributed to the bias of geographic locations of the 30 cave sites as the models mostly show high variance in δ 18 O sim in very dry regions and around the regions of the inter-tropical convergence zone.

Can external forcings be resolved by speleothems?
We found that 86% of speleothems have a significant temporal correlation between speleothem oxygen and carbon isotopes, with 47% even showing strong significant (anti-) correlations of |c| > 0.5. The co-variability of both isotopes has been studied 35 for a very arid region stalagmite by Fohlmeister et al. (2017) who also found strong correlation between both isotopes. High correlation between the isotopes could hint at kinetic isotope fractionation effects (Hendy, 1971). Fohlmeister et al. (2017) attribute increased correlation to times of strong variations in cave-internal processes triggered by variations of external conditions. This simultaneity agrees with our findings that generally no extreme event in isotopes precedes the other, which can, however, also be attributed to low sampling resolution. Changes in temperature or precipitation due to aerosol forced cooling have been analyzed in a δ 13 C record as signs of volcanic signatures of speleothems (Ridley et al., 2015). Growth rate changes (Baker et al., 1995) or the measurement of trace elements such as sulphur (Frisia et al., 2008) are other techniques to detect volcanic signals in speleothems but they generally require up to sub-annual resolution records. In our global analysis of 58 δ 13 C speleo and 89 δ 18 O speleo records, we saw no significant increase in extreme events in the isotope records coinciding with major volcanic eruptions. The individual 5 isotopes yielded more distinct signatures of volcanic eruptions with up to 20% (15%) of speleothems recording synchronous extreme events in δ 18 O speleo (δ 13 C speleo ) and a volcanic eruption, very similar to δ 18 O sim at record resolution. Both stayed, however, well below the possible simulated detection at cave locations under annual resolution of up to 50%. The comparison to the synchronous events to the down-sampled δ 18 O sim showed that the ability to capture events such as volcanic eruptions strongly decreases with record resolution. The attribution of specific peaks in speleothem data to volcanic events needs 10 caution because of age-uncertainties and other possible explanations for the changes e.g. human settlements close-by (Baker et al., 1995). Increasing the bin size to the average age-uncertainty within the last millennium sub-set of the SISALv2 yielded the same results (SFig. 11).
Solar variation is another external forcing which is often invoked as an influence on the monsoon cycle that has been investigated using speleothem records (Neff et al., 2001;Lone et al., 2014;Cosford et al., 2008). In these studies, it is also 15 standard to use high-resolution speleothems with a lowest resolution of about 5 yr for Lone et al. (2014). We repeated the analysis for Fig. 10 to analyze synchronous extreme events of δ 18 O speleo and δ 13 C speleo to the total solar irradiance input (SFig. 10). While the general effects of decreased detectability with decreased resolution are also visible, the overall detection of extreme solar irradiance was much weaker than for volcanic eruptions. This was not only true at cave locations but globally when comparing simulated surface climate variables to solar variations (results not shown). As solar variation 20 on this timescale is mostly cyclical compared to random extreme volcanic eruptions (compare Fig. 1), these results were to be expected from the methods used and are not in contradiction to the literature.
Summarizing, the comparison with the δ 18 O sim showed that cave locations are in general suitable to detect climatic changes due to changes in volcanic or solar forcing. Even though speleothems are highly resolved archives with little ageuncertainties compared to other archives, the median resolution during the last millennium of 6.08 yr (4.07, 7.85) was 25 not enough to resolve changes in δ 13 C speleo or δ 18 O speleo due to potential solar or volcanically-induced climatic changes.
Karst-mixing effects which further dampen the signal, as discussed for the variability on shorter timescales for SFig. 9, may decrease the ability to detect these changes, if they exist, even further.

Limitations
A current weakness of this type of analysis is that we only compared simulated oxygen isotopes of PMIP2/PMIP3 generation 30 models to archived oxygen isotope signals. For carbon isotopes no simulated values are available yet. For iCESM, a carbon cycle in the ocean exists (Jahn et al., 2015) and great effort is put into the incorporation of a carbon-cycle to iGCMs by the scientific community. The multi-model comparison in this study only included how different models represent isotopes.
However, parameter and tuning choices within one model, especially in the cloud and convection scheme, have a strong imprint on δ 18 O signatures (for example Nusbaumer et al. (2017) for iCESM, Field et al. (2014) for GISS-E2-R). To further 35 systematically explore and constrain modelled δ 18 O, a multi-model ensemble under different model set-ups will be needed.
We also only looked at carbon and oxygen isotopes as possible proxies for climatic changes. In contrast to e.g. ice-cores, speleothems do not directly record precipitation δ 18 O but instead archive δ 18 O with additional fractionation processes. Both δ 18 O and δ 13 C undergo fractionation processes which can be influenced by various cave-internal processes (Lachniet, 2009;Fairchild and Baker, 2012;Hartmann and Baker, 2017). Here, the isotopes in the drip-water are influenced by many fraction-40 ation processes that are not climate-related (Dreybrodt and Scholz, 2011;Fohlmeister et al., 2020). Considering additional 23 https://doi.org /10.5194/cp-2021-152 Preprint. Discussion started: 24 November 2021 c Author(s) 2021. CC BY 4.0 License. processes such as prior calcite precipitation (PCP) or other geochemical climate-related proxies can help to decipher the climatic signal from karst-or cave-internal processes (Kaufmann, 2003;Schwarcz et al., 1976;Owen et al., 2016;Tremaine and Froelich, 2013;Noronha et al., 2014). Especially sulphur proved to be a valuable tracer in detecting volcanic eruptions (Frisia et al., 2008). This possible offset between model and data is, however, assumed to be the same for all simulations and does not affect the multi-model analysis. 5 When comparing the speleothem isotopes to volcanic data, we note that there are more recent volcanic reconstructions available, which suggest a modification to the timing or magnitude of last millennium eruptions (Sigl et al., 2015). Given the temporal resolution of the speleothems, changes in timing of volcanic events would impact the comparisons of model data to only those speleothems with very high temporal resolution. As only the timing of the extreme event is compared, a change in magnitude would be irrelevant. As for the comparison with the simulated data, only the response of the model to any given 10 eruption is important (Colose et al., 2016b), as we did not compare the timing of extreme events in speleothems to those in the simulation.
Compared to the capability of speleothems to cover complete glacial-interglacial cycles, we only looked at very short timescales when focusing on the last millennium and what drives variability on decadal to centennial timescales instead. The last millennium is considered a relatively stable time period and climatic changes might not be strong enough to be fully 15 captured by speleothems. On longer timescales, speleothems may still be a good archive to capture these larger changes (Genty et al., 2006).

Conclusion
We presented a multi-model comparison over five last millennium isotope-enabled simulations (ECHAM5-wiso, GISS-E2-R, iCESM, iHadCM3 and isoGSM) and compared their representation of isotopic signatures in mean and variability to 20 paleoclimate data from a large speleothem database (a last millennium subset of SISALv2). We found that δ 18 O sim differed substantially between models on a regional scale as well as at speleothem cave sites. This could mostly be attributed to differences in modelled temperature between models. This effect can be compensated by using the multi-model mean. The isoGSM simulation showed the lowest absolute mean offset to the speleothems at cave locations, while all other simulations show only slightly higher offsets. Variability on decadal and longer timescales in the speleothems was higher than indicated 25 by models and was also best represented by isoGSM, which, however, still underestimated variability on these timescales.
No relationship was found between the spatial resolution of the models and their variability of the isotopic composition of precipitation. In all models, temperature was driving δ 18 O sim variability in high latitudes and precipitation in low latitudes.
At cave site locations in particular, which are mostly located in low-to mid-latitudes, models agreed more on temperature being the driving factor of water isotope variability than on precipitation. 30 Dividing the global set into latitude bands, we were able to distinguish temperature and altitude relationships for both the oxygen and the carbon isotopes, as well as significant relationships for δ 18 O dweq to other simulated climate variables. While most records showed significant correlation between the two isotopes, using both isotopes to gain more information than just from one remained difficult. Especially, variations in solar and volcanic forcing are not imprinted in either the single isotope or the pair on a global scale. Many archive limitations could, however, be attributed to the low resolution of the data-set 35 compared to the processes expected to be resolved.
Our analysis encourages the use of multi-model means whenever possible as already suggested by other studies (Colose et al., 2016a). From the point of model evaluation, the incorporation of different archives with higher resolution (e.g. corals, trees, ice cores as in the iso2k database (Konecky et al., 2020)) and with the help of improved proxy system models may provide further insight into why offsets between models can be so large regionally. From a speleothem perspective, within- cave and between-cave variability comparisons using both isotopes will help to understand the recorded signal better and give higher confidence. Future multi-model comparisons of isotope-enabled models for other time periods are required to further evaluate biases in models, as well as comparisons to δ 18 O-archives of all kinds, as to help to constrain models. If carbon isotopes are included in a multitude of water-isotope-enabled models as well, this needs to be repeated for both isotopes, to gain a deeper 5 understanding of the underlying concept in what influences variability and co-variability of isotopes in speleothems.
Code and Data availability.
Code to reproduce figures and analyses will be made available in the final manuscript on GitHub. Model data as csv-files with output at cave locations (ECHAM5-wiso, GISS-E2-R, iCESM, iHadCM3, isoGSM) at annual and record resolution, 10 as well as monthly fields of surface temperature, precipitation, isotopic composition of precipitation and evaporation or latent heat respectively for all simulations will be made freely available on Pangaea in the final manuscript. The SISAL (Speleothem Isotopes Synthesis and AnaLysis Working Group) database version 2 (SISALv2) is publicly available through the University of Reading repository at http://dx.doi.org /10.17864/1947.256 (Comas-Bru et al., 2020a. We use R for the data analysis (R Core Team, 2020a). The main packages are tidyverse (Wickham et al., 2019), ncdf4 (Pierce, 2019), ggplot2 15 (Wickham, 2016), raster (Hijmans, 2020), zoo, (Zeileis and Grothendieck, 2005), plyr (Wickham, 2011), and (Wickham et al., 2021). We use the nest R package (https://github.com/krehfeld/nest Rehfeld et al., 2011;Rehfeld and Kurths, 2014) and the PaleoSpec package (https://github.com/EarthSystemDiagnostics/PaleoSpec Kunz et al., 2020). Competing interests. The authors declare that they have no conflict of interest. 25 Acknowledgements. This study includes data compiled by SISAL (Speleothem Isotopes Synthesis and Analysis), a working group of the Past Global Changes (PAGES) project. PAGES received support from the Swiss Academy of Sciences and the Chinese Academy of Sciences. The project is in part inspired by discussions at the SISAL 4th workshop: Exploiting the SISALv2 database for evaluating climate processes, Xi'an, China, 14-18 October 2019. Nils Weitzel, Elisa Ziegler,