The long-standing dilemma of European summer temperatures at the mid-Holocene and other considerations on learning from the past for the future using a regional climate model

. The past as an analogue for the future is one of the main motivations to use climate models for paleoclimate applications. Assessing possible model limitations in simulating past climate changes can lead to an improved understanding and representation of the response of the climate system to changes in the forcing, setting the basis for more reliable information for the future. In this study, the regional climate model (RCM) COSMO-CLM is used for the investigation of the mid-Holocene (MH, 6000 years ago) European climate, aiming to contribute to the solution of the long-standing debate on the reconstruction of MH summer temperatures for the region, and gaining more insights into the development of appropriate methods for the production of future climate projections. Two physically perturbed ensembles (PPEs) are


Introduction
The mid-Holocene (MH, approximately 6000 years before present, BP) is one of the main test-beds for evaluating the response of climate models to changes in climate forcing (Otto-Bliesner et al., 2017). For this period, the particular configuration of the Earth's orbit around the Sun led to remarkable changes in the seasonal cycle of insolation. Knowing whether models react properly to those changes might give us important hints on their reliability for the investigation of the future (Haywood et al., 2019).
The reconstruction of European summer temperatures at the MH has been the subject of a long-standing debate for more than 30 years (Huntley and Prentice, 1988;Cheddadi et al., 1996;Masson et al., 1999;Davis et al., 2003;Mauri et al., 2015;Russo and Cubasch, 2016). Changes in the seasonal cycle of insolation at different latitudes resulted in a higher summer solar radiation input over the Northern Hemisphere at the MH than today (Berger, 1978;Berger and Loutre, 1991;Berger, 2013). One could expect that climate surface variables, such as near-surface air temperature, would have directly responded to the changes in the forcing, with consequently warmer conditions over the entire European continent. Indeed, all climate models, without exception, show a very homogeneous warming in summer across the whole of Europe for the MH, as the result of a simple direct thermodynamic response to increased summer insolation (Mauri et al., 2015). Although proxy-based reconstructions are generally in line with the summer warming shown by models over northern Europe, a similar common agreement is not evident over the south when considering different types of records. Evidence from continental-scale pollen-based reconstructions show a large extension of colder temperatures over the Mediterranean region at 6000 BP (Huntley and Prentice, 1988;Cheddadi et al., 1996;Davis et al., 2003;Mauri et al., 2015), in contrast with the overall warming evinced from climate models (Masson et al., 1999;Mauri et al., 2014;Fischer and Jungclaus, 2011;Russo and Cubasch, 2016;Strandberg et al., 2014;Brierley et al., 2020). On the other hand, reconstructions based on other proxies such as chironomids, marine cores and glaciers (Samartin et al., 2017) show warmer-than-present conditions over several locations of southern Europe at the mid-Holocene.
The discussion on which picture is to be considered more reliable has been long-standing and is still unsolved. In this context, different studies have "evaluated" the results of climate models against pollen-based reconstructions of MH European summer temperatures. However, no thorough analysis has been conducted so far using models for testing plausible physical drivers that could explain the spatial dipole structure of summer temperatures at the MH evinced from these proxy records, with warmer values with respect to the present day over northern Europe and colder over the south. Some studies using the results of climate simulations (such as the results of the Paleoclimate Modelling Intercomparison Project phase 3, PMIP3) mainly focused on the average response of the models, rather than investigating whether and for what reason individual models can reproduce European summer temperatures more similar to the reconstructions (Brewer et al., 2007;Mauri et al., 2014;Samartin et al., 2017). In general, there is a need to better investigate climate models' behavior for this case study, exhaustively exploring their physics and testing hypotheses that could plausibly produce results consistent with the evidence derived from pollen-based reconstructions.
One of the potential hypotheses proposed in the literature for explaining the cooler summer temperatures over the Mediterranean region at 6000 BP is that more winter and early spring precipitation could have led to increased soil moisture availability at the beginning of summer. In combination with the enhanced insolation, an increase in latent heat, surface evapotranspiration and a subsequent decrease in summer surface temperatures can be consequently assumed for the region (Bonfils et al., 2004;Mauri et al., 2015;Russo and Cubasch, 2016). Climate models do not seem to accurately partition the incoming radiation between latent and sensible heat, leading to excessive summer temperatures over the area, in some cases exceeding 5 • C (Russo and Cubasch, 2016). A similar overestimation of summer temperatures over the Mediterranean region has also been noticed for present-day simulations with regional climate models (RCMs) (Kotlarski et al., 2014;Christensen et al., 2008;Boberg and Christensen, 2012), as well as with Global Circulation Models (GCMs) (Carvalho et al., 2021;Cattiaux et al., 2013). This issue has been related to deficiencies of climate models in correctly simulating soil moisture availability at the beginning of summer, resulting from an overly fast depletion of spring moisture, and leading to drier and hotter conditions (Seneviratne et al., 2006(Seneviratne et al., , 2010Davin et al., 2016). Similar issues have been also discriminated for soil moisturecontrolled evaporative regimes during the Holocene in central Eurasia . The plausibility of the suggested hypothesis can be effectively evaluated with the aid of climate models.
Besides constituting a unique opportunity for better understanding the response of the climate system to changes in the forcing, the investigation of the MH European climate can be useful also for gaining additional insights into the use of models for the study of the future. Even though climate models are deterministic, changes in their unconstrained model parameter values may lead to different results, assuming a large spectrum of outcomes. The best compromise when applying climate models to study future or past climate is to calibrate them against observations for the present and determine an optimal model configuration that can be assumed to be the best for other time periods as well (Bellprat et al., 2012a, b;Russo et al., 2019;Russo et al., 2020). However, this is just an assumption, since there is no guarantee on whether the best model configuration for the present will be the same for other periods characterized by different forcing. In addition, even though several parameter sets can produce similar present-day mean climate, being in agreement with observations, their different sensitivity to radiative forcing perturbations might lead to vastly differing results under different forcings (Loutre et al., 2011). In recent years, socalled objective calibration methods have been developed for tuning both RCMs and GCMs (Bellprat et al., 2016;Hourdin et al., 2017;Hauser et al., 2012;Mauritsen et al., 2012;Williamson et al., 2015;Bellprat et al., 2012a). In these methods, first a sub-set of changes in model parameters and their mutual combinations are tested. Furthermore, based on these results, a statistical model for extrapolating optimal values of unconstrained parameters is developed. Optimal calibration approaches are based on the performance of a large number of simulations, in a range of 500 to more than 1000 years, being particularly expensive in terms of computational resources. Therefore, their application is not suitable for each case study. In particular, given a continuously increasing complexity of climate models and the consideration of higher spatial resolutions, their use would put a major constraint on the availability and use of future resources. As an alternative to these approaches, an ensemble of model simulations, referred to as physically perturbed ensemble (PPE) (Forest et al., 2002;Knutti et al., 2002;Murphy et al., 2004;Stainforth et al., 2005;Bellprat et al., 2012a), can be performed to explore different configurations of a specific model for a given period and domain. The advantage of this method is that besides not making any assumption on the stationarity of model biases, it always allows the association of a model answer with a certain estimate of its structural uncertainty. Investigating how the structural uncertainty of a climate model varies over two periods of time characterized by different forcing could help to assess the robustness of the assumption of stationarity proper of calibration approaches, and to determine whether PPEs instead of single runs based on optimal model configurations are preferable for the performance of climate projections.
In this study, with the main goal of uncovering possible processes that might be relevant to explain the patterns of differences in European summer temperature between the MH and the pre-industrial (PI) periods as reconstructed from pollen data, a series of sensitivity experiments are conducted with the COnsortium for Small-scale MOdelling in Climate Mode (COSMO-CLM, Rockel et al., 2008) RCM. Firstly, a series of simulations are performed for each of the considered periods by perturbing the model parameters that have proven to be the most sensitive for the region (Bellprat et al., 2012a(Bellprat et al., , 2016 and selecting different physical options that are thought to be relevant for climate feedbacks related to changes in radiation.
Secondly, with the aim of investigating the plausibility of soil-atmosphere interactions as the main driver of the bipolar behavior of summer temperatures over Europe at the MH, another set of sensitivity experiments is conducted with the same model over Europe by perturbing the initial soil moisture conditions of a reference state at the beginning of spring. By evaluating relative changes in summer near-surface temperatures of these simulations, this study aims to shed more light on possible reasons for climate model biases against pollen-based reconstructions.
Finally, in addition to the aforementioned objectives of the paper, the ensembles of simulations with different configurations produced for the study of changes in MH and PI European summer temperatures are employed here to assess the reliability of the assumption of stationarity typical of calibration approaches used for RCMs.
The methods of the study are introduced in Sect. 2 and include information on the applied model, the performed experiments and the metrics considered in the conducted analyses. Then, in Sect. 3, the results are presented and discussed. Finally, conclusive remarks are summarized in Sect. 4.

Regional climate model
The COnsortium for Small-scale MOdelling in Climate Mode (COSMO-CLM, Rockel et al., 2008) is a nonhydrostatic, limited-area atmospheric model developed by the Climate Limited-area Modelling Community (CLM-Community, https://www.clm-community.eu, last access: 1 June 2021), an international network of scientists that join together efforts to develop and use community models (Sørland et al., 2021). COSMO-CLM is the climate version of the numerical weather prediction model COSMO, developed by the German Weather Service (DWD) in the 1990s (Steppeler et al., 2003;Baldauf et al., 2011;Sørland et al., 2021).
The model version used in this study is the COSMO-CLM 5.0_clm9. For its application to study past climates, the model needs to be modified to take into account changes in the Earth's configuration around the Sun on millennial timescales (i.e., changes in the eccentricity of the orbit, obliquity and precession). For this purpose a FORTRAN-based subroutine is implemented in the main radiation module of the model code, following the same approach of other paleoclimate studies (Russo and Cubasch, 2016;Prömmel et al., 2013;Fallah et al., 2016;Fallah et al., 2018). Additional changes to the model's code are required to account for different greenhouse gas concentrations in the past. An overview of the values of the orbital parameters and greenhouse gas concentrations used in the PI and the MH simulations (based on PMIP3 guidelines; see https://pmip3.lsce. ipsl.fr, last access: 1 June 2021) is presented in Table 1.
The entire model domain includes 125 grid points in the longitude and 122 grid points in latitude directions, with a Table 1. Values of orbital parameters and greenhouse gases concentrations for the PI (left) and MH (right) periods used for the conducted modeling experiments. The orbital parameters are the eccentricity of the orbit (ECC), the obliquity of the Earth's axis (OBL) and the precession of the equinoxes (PRE  spatial resolution of 0.44 • (∼ 50 km), covering the whole of Europe. At each side of the domain, 10 grid points are used as a boundary relaxation zone and are excluded from the analysis. A map of the extension and topography of the inner domain of study is presented in Fig. 1. All performed simulations are derived by applying changes to the setup of a reference run, using an "optimal" configuration slightly different than the most recent one proposed by the CLM-Community for Europe (Sørland et al., 2021). The reference run uses the Integrated Forecast System (IFS) model Tiedtke-Bechtold convection scheme (Bechtold et al., 2001) and a two-time-level Runge-Kutta scheme with time-split treatment of acoustic and gravity waves for time integration. It also uses a secondorder Bott scheme for moisture variables and aerosol advection (Bott, 1989) and a prognostic turbulent kinetic energy (TKE) scheme for the vertical turbulent heat and momentum fluxes. Furthermore, a radiative transfer scheme (Ritter and Geleyn, 1992) and a one-moment, three-category (cloud ice, snow and graupel) ice scheme with prognostic treatment of the hydrometeors (Reinhardt and Seifert, 2006) are applied. The employed soil component is the multilayer soil model TERRA_LM (Schrodin and Heise, 2002;Schulz et al., 2016). TERRA_LM is a unidimensional soilvegetation-atmosphere transfer scheme (SVAT) regulating momentum and heat fluxes between soil and the atmosphere. In TERRA_LM each grid point belongs to the same soil category throughout all soil depth. A total of eight soil types are available in the model: rock, ice, sand, sandy loam, loam, loamy clay, clay and peat. A set of parameters for each soil category is prescribed to the model, including the pore volume, heat capacity and hydraulic conductivity. A table with the different soil parameters and their values is provided in the Supplement. For the presented simulations, a map of soil categories derived from FAO (2003) is used.
The reference run considers 50 atmospheric vertical layers, up to a height of 22 000 m, and a total of 9 hydrological active layers in the soil, down to a depth of 7.6 m. It also considers a treatment of the albedo based on dry and saturated soil. The main features of the reference simulation are summarized in Table 2, with information on the thickness and depth of the center of each of the different active soil layers reported in Table 3.
Aiming to perform an extensive amount of sensitivity experiments in this study, targeting processes that could have an important impact at local and regional scales, the use of an RCM is mainly dictated by its relatively cheap computational demands. Even though the use of RCMs for paleoclimate applications has been disputed by a recent study by Armstrong et al. (2019), given that they are mainly limited by biases imposed at their lateral boundaries by the driving GCM, the fact that RCM outcomes vary significantly as a result of perturbations to their unconstrained parameter values and physical schemes (Bellprat et al., 2012a(Bellprat et al., , b, 2016Russo et al., 2019;Russo et al., 2020) supports its suitability for the purposes of this research.

Driving global circulation model
Initial and boundary data for the COSMO-CLM simulations are obtained from global simulations with the Max Planck Institute Earth system model in paleoclimate mode (MPI-ESM-P, Jungclaus et al., 2013). Output data with 6-hourly resolution are obtained from the MPI-ESM PI (Jungclaus et al., 2012a) and MH (Jungclaus et al., 2012b) simulations, respectively. The MPI-ESM includes coupled GCMs for the atmosphere and ocean as well as subsystem models for land and vegetation and for the marine biogeochemistry . The atmospheric component ECHAM6  is run at T63 horizontal resolution (1.875 • on a Gaussian grid) with 47 levels in the vertical. The ocean component used for the MPI-ESM-P simulations is the MPIOM (MPI Ocean Model, Jungclaus et al., 2006;Jungclaus et al., 2013). The ocean grid has a nominal resolution of 1.5 • and  considers 40 unevenly spaced levels in the vertical, ranging from 12 m near the surface to several hundred meters in the deep ocean . The global simulations are conducted using the same values for the orbital parameters and GHG concentrations as the COSMO-CLM experiments in Table 1.

Physically perturbed ensemble
A total of 31 experiments for each of the considered periods (PI and MH) are performed to build a PPE, including the reference run, leading to a final set of 62 COSMO-CLM simulations. Starting from the reference configuration, selected parameters and physical options are perturbed consistently over the two periods. Most of the perturbed parameters are the ones for which the model has proven to be most sensitive for Europe (Bellprat et al., 2012a(Bellprat et al., , 2016Russo et al., 2020) and include at least one member for each of the main model schemes (i.e., turbulence, land-surface, convection, soil and radiation). A set of parameters is considered following the studies of Bellprat et al. (2012aBellprat et al. ( , 2016. It affects sub-grid-scale cloud formation (uc1), shallow convection (entr_sc), interaction of radiation with clouds (radfac), turbulent transport of heat and moisture (tkhmin), exchange of heat and moisture between the atmosphere and the land surface (rlam_heat), strength of transpiration of the vege-tation related to depth of rooting zone (facroot_dp2), and hydraulic cycling of soil moisture (soilhyd). Additionally, the parameters controlling heat exchange between the lower atmosphere and ocean (rat_sea), maximal turbulent length scale (tur_len), dissipation of turbulent heat and momentum (d_heat, d_mom), and the factor controlling the effective surface area (e_surf) are also considered, based on the sensitivity of the model as evinced from more recent studies (Russo et al., 2020). The explored physical options of the model are selected considering their potential to be sensitive to changes in radiative forcings, such as the interval of the call to the radiation scheme, the type of albedo representation and the soil hydraulic lower boundary with drainage and diffusion. A list of the different configurations tested starting from the one of the reference simulation is reported in Table 4, together with more detailed information. Each of these simulations covers 25 years, with 5 years considered as spin-up and excluded from the analysis. In total, the performed experiments with changes in the model configuration cover more than 1500 years of simulations. The set of conducted experiments is large enough to cover an extensive part of the parameter uncertainty of COSMO-CLM (Bellprat et al., 2012b(Bellprat et al., , 2016, being the main target of objective calibration methods developed in recent years.

Perturbed initial soil moisture experiments
Eight additional simulations with perturbed soil moisture conditions are performed over a shorter period of 6 months. All these simulations are initialized on 1 April of the 15th year of the MH simulation period and use the same configuration as for the reference experiment of Sect. 2.3. The first of these experiments considers, for each point of the domain, each of the hydrological active soil layers as half-saturated at initialization. This means that a value of 50 % of relative soil moisture is set for each soil layer at the beginning of the simulation. The relative soil moisture is calculated considering the pore volume of each point of the domain, which is in COSMO-CLM a function of the soil type: where W so is the relative soil moisture for a given point with x and y coordinates i and j , respectively; W l represents liquid water; z the depth of the considered soil layer z; and V p is the pore volume of the given point (Baur et al., 2018). Six additional simulations are then conducted, increasing and decreasing the initial relative soil moisture of the first run with half-saturated soil moisture, by 25 %, 50 % and 75 %, respectively. Another experiment is also conducted starting from fully saturated soil moisture conditions (+100 %) at initialization. Changes in the mean summer temperature of these experiments are analyzed with respect to the simulation with half-saturated initial soil moisture conditions. For RCMs, the use of different boundaries (for example a different GCM or the same one with different forcing or configuration) could have a significant effect on the model results, often even more relevant than the employed RCM version or selected parameter values (Sørland et al., 2021). To take into account the effect of different boundaries on the sensitivity of the model to soil moisture perturbations, an additional set of 10 6-month long simulations are finally performed. More specifically, for each of five randomly selected years of the simulation period, two experiments with, respectively, an increase and a decrease of 75 % in soil moisture at the beginning of spring (with respect to half-saturated soil) are conducted.
As a matter of clarity, it is necessary to specify here that an increase (decrease) in soil moisture at initialization by, for example, +75 % (−75 %) corresponds to a total value of 87.5 % (12.5 %) of relative soil moisture.

Metrics for evaluating the assumption of stationarity of calibration approaches
The assumption of stationarity proper of calibration approaches is investigated here by means of the mean absolute error (MAE). Three variables that are normally used in RCM calibration procedures are considered (Bellprat et al., 2012a, b;Russo et al., 2020), namely near-surface temperature (T2M), precipitation (RR) and total cloud cover (CLCT). In a first step, the MAE is calculated over daily mean anomalies for each of the three variables and each land point of the domain, for the PI and MH periods separately: where e, V , i and j are the selected experiment, the considered variable, and the coordinates in the longitudinal and latitudinal directions of a given point, respectively. Additionally, d represents the considered day of the simulation period, with the total number of days D being 20 × 365. Finally, X N is the target simulation against which the bias is calculated ("nature" state). In a second step, the MAE is then calculated over regional monthly means, after subdividing the domain of study into a set of 10 sub-regions. The domain division is conducted similarly to regionalizations usually performed for Europe within the CORDEX framework or in other studies (Kotlarski et al., 2014;Bellprat et al., 2012aBellprat et al., , b, 2016, assigning almost every point of the domain to a pre-defined climatic zone. A map of the 10 selected sub-regions is presented in Fig. 2. In this case the MAE is calculated as follows: where, besides the same indices already introduced in Eq. (2), r indicates the considered sub-region (R = 10) and m the given month (M = 12 × 20). In both formulas for the calculation of the MAE (Eqs. 2 and 3), one of the model realizations is assumed as representative of the real state of the climate system in the two different periods. All the simulations are then ranked considering their distance from this "nature" realization. The reference simulation of Sect. 2.3 is considered in a first place as the "nature" state. Successively, for supporting the plausibility of the evinced results, the proposed analyses are reiterated using different realizations as the target for the calculation of the MAE.

PPE summer temperatures
In this section, PPE results are explored with the primary goal of discriminating processes that could possibly lead to a spatial pattern of summer mean near-surface temperatures at the MH closer to evidence from pollen-based reconstructions. For this purpose, the analyses focus on the anomalies between the MH and PI climatologies derived from the PPEs with different model configurations. Figure 3 shows the ensemble mean of the anomalies obtained by subtracting from the MH climatological mean of each realization the corresponding PI value. The mean anomalies are in a range of 0 to +2.5 • C. The mean model behavior does not show different results compared to other studies (Brewer et al., 2007;Mauri et al., 2014;Russo and Cubasch, 2016;Brierley et al., 2020), as the entire domain is characterized by a warming signal. This signal is heterogeneously distributed though, revealing a north-west to south-east gradient, with smaller anomalies over the British Isles, increasing towards eastern Europe and the Mediterranean region, and reaching a maximum in the area north of the Black Sea.
The maximum absolute differences in the MH-PI anomalies of summer temperatures, calculated for each point of the domain between the different ensemble members, are presented in Fig. 4. These differences are quite constrained, with values exceeding 0.5 • C only over parts of the Balkans and continental Europe. This is the result of a similar structural uncertainty of the model in summer for both periods (see the Supplement), indicating its limited ability in freely responding to changes in the forcing during the summer season, for any of the considered configurations. Basically, none of the tested model setups produces a remarkably different response, with respect to the other model states, for the two periods. This seems to be true not only for summer temperatures, but also for other seasons and variables (see the Supplement), pointing at a general stationarity of the model uncertainty under different forcing, at least when considering climatological values. In conclusion, none of the investigated changes in the model configuration, and associated processes, leads to summer temperatures over Europe that could be in better agreement with evidence from continentalscale pollen-based reconstructions.

Perturbed soil moisture experiments
In this section, the results of the eight MH experiments with perturbed initial spring soil moisture conditions are presented. Figure 5 shows the differences in mean summer temperatures of the simulations with changes in initial spring soil moisture (25 %, 50 %, 75 % increase and decrease and 100 % increase) with respect to the simulation with half-saturated soil. Changes in available spring soil moisture seem to have an important effect on the simulated summer temperatures of the region (Fig. 5). In particular, there is a strong spatial dependency of the sensitivity of the model to moisture perturbation, with the areas of the Balkans and north of the Black Sea presenting the largest changes, up to 5 • C in the case of a reduction of initial soil moisture by 75 %. A similar spatial sensitivity is also evident for the model when considering different boundary conditions (see the Supplement).
Differences in summer temperatures are more pronounced for experiments with initially drier soil, generally presenting warmer conditions (Fig. 5, right column). In contrast, experiments with increased initial soil moisture lead to an overall cooling. In this case though, all the simulations present a very similar spatial distribution of summer temperatures (Fig. 5, left column), independently from the magnitude of the changes applied to the initial conditions. Analyzing the temporal evolution of soil moisture at the different model levels over the entire 6 months of simulation (Fig. 6), it is evident that the depletion of moisture is faster when more moisture is added to the initial conditions. This leads quickly to similar moisture availability at the beginning of summer in each of the considered cases, particularly in the upper soil levels. The experiments suggest that even if more soil moisture would be available in early spring in COSMO-CLM, as a consequence of, for example, increased late-winter precipitation, this would be depleted too quickly, leading to no appreciable changes in summer temperatures.
A similar model behavior is found for present-day studies, where a warm and dry bias of RCMs against observations over the Mediterranean region is attributed mainly to an overestimation of the evapotranspiration in spring, leading to an overly rapid depletion of soil moisture and, consequently, drier soil conditions in early summer (Seneviratne et al., 2010;Kotlarski et al., 2014;Davin et al., 2016). Davin et al. (2016) solved this issue by coupling the COSMO-CLM to a more complex soil scheme than the default TERRA_LM: the Community Land Model 4.0 (Oleson et al., 2010;Lawrence et al., 2011). In this way, they were able to sensibly reduce the warm summer bias of the model over the Mediterranean region and confirm the important role of land process representation to overcome this long-standing deficiency of climate models.
The experiments presented here cannot directly explain the disagreement between climate models and pollen-based reconstructions for MH summer temperatures over the Mediterranean region. However, they are particularly relevant since they confirm that pronounced regional differences in European summer temperatures during the MH may be related to soil-atmosphere interactions. In this context, the soil scheme of climate models applied to the study of European MH climate acquires significant importance. Considering the highlighted common model deficiencies related to soil-atmosphere coupling, known to notably affect simulated temperatures, a prerogative to any attempt to reconstruct the European climate at the MH using climate models is that the performances of their soil component must be first carefully evaluated, assessing its reliability in terms of retained spring soil moisture. The use of high-complexity soil schemes, as suggested for present-day studies, should be eventually considered.
Finally, it is also worth mentioning here that a presentday distribution of soil categories is used for the presented experiments. On millennial timescales, a possible source of uncertainty that needs to be additionally considered in modeling studies is the fact that soil might have changed with respect to the present day, for natural or anthropogenic reasons. This could have an impact on the moisture storage capacity of soil and, ultimately, on the simulated temperatures. This point should also be acknowledged in future studies of MH European climate using climate models.

Testing the assumption of stationarity of calibration approaches for RCMs
In this section, the PPEs produced for the PI and MH periods are used for testing whether an optimal model configuration for one period can also be assumed to be the best under different forcings. First, the analyses are conducted on daily mean anomalies calculated for each grid point of the domain and for each of the three considered variables separately. The probability distribution functions (PDFs) of total cloud cover daily mean anomalies derived for each member of the ensemble, for a randomly selected point of sub-region 8, are depicted in Fig. 7 as an illustrative example. The PDF of the reference simulation for the PI period is highlighted in black and is considered as a theoretical "nature" state (what would typically be the target of a calibration). In the same figure (Fig. 7, left), the PDF of the PI experiment with the smallest MAE (Eq. 2) with respect to the "nature" state is represented by the red curve. All other ensemble realizations are plotted in light gray. For the MH (Fig. 7, right), the same colors are used for the same experiments. Thus, the red curve in the MH plot represents again the ensemble member closest to the reference run at the PI period. This aims to show how much the best simulation in one period (PI) diverges from the reference in the other (MH). In each panel of Fig. 7, the number of the best-performing experiment for the selected point, in terms of the MAE of Eq. (2), is reported in the top-left corner. The optimal realization changes in the two periods, with simulation 2 (with the exponent to get the effective surface area set to 0.1) being the best in one case, and experiment 26 (with the factor for turbulent heat dissipation set to 15) in the other.
Considering the MAE calculated over the daily mean anomalies (Eq. 2) for each land point of the domain, the "best" model configuration changes in the two periods for over 91 % of the points for 2 m temperature, 92 % for precipitation and 89 % for total cloud cover. The same analyses assuming different realizations as the "nature" state, such as experiments 5 and 9, lead to similar conclusions.
These are further confirmed by additional analyses of the MAE calculated over monthly spatial means, usually being the target of calibration methods employed for COSMO- Figure 6. Temporal evolution of soil moisture at the nine hydrological active layers (lev1 to lev9) for the experiments with increased initial soil moisture at the beginning of April. More detailed information on the depth and thickness of each layer can be found in Table 3 CLM. For all of the considered variables, no realization that performs best for one period maintains its "status" in the other (see the Supplement). The presented results are not dependent on the supposed "nature" state. Again, repeating the same analyses using a different realization as the target (as before, simulation 5 and simulation 9) leads in fact to similar conclusions.
These results suggest that using resources for the calibration of RCMs, in order to determine an optimal model configuration, might not be the most ideal approach for the study of future and past climate. The production of small PPEs sampling a good part of a model structural uncertainty and likely to contain the best model answer under different forcing would be a preferable option to follow instead.

Conclusions
In this study, the regional climate model COSMO-CLM is used with the main goal of gaining a better understanding of the possible drivers of the long-standing mismatch between Figure 7. Probability distribution functions of daily mean anomalies of total cloud cover calculated for the different ensemble realizations for the PI (a) and the MH (b) periods, for a randomly selected grid point in subregion 8 of Fig. 2. The PDF of the reference run, considered as the "nature" state, is highlighted in black. The realization with the minimum MAE with respect to the PI reference run is highlighted in red in both panels. Gray lines represent the remaining members of the ensembles for the two periods. outcomes of climate model simulations and pollen-based reconstructions of mid-Holocene (MH) summer temperatures over Europe. Additionally, trying to learn from the past for the future, the study also uses the MH climate as a test-bed for investigating appropriate approaches for the performance of reliable climate simulations.
Two physically perturbed ensembles (PPEs) are produced to assess how the model reacts, for different parameter values and physical options, to changes in the radiative forcing over two distinct time spans (pre-industrial, PI, and MH). The mean differences in seasonal summer temperatures calculated between realizations with the same model configuration for the two considered periods show generally warmer conditions at the MH over all of Europe, consistently with the results of previous modeling studies. In general, each member of the produced PPE does not behave remarkably different with respect to the other ensemble members for both the MH and PI periods. The maximum differences in the anomalies of summer temperatures between the two periods, calculated among the different ensemble members, are in fact very much constrained over most of the domain of study. This indicates a limited sensitivity of the model to changes in the climate forcing in terms of its structural uncertainty, suggesting that none of the investigated changes in model configuration, and the associated physical processes, leads to remarkable variations in European summer temperatures closer to the evidence of continental-scale pollen-based reconstructions.
Furthermore, additional sensitivity tests are conducted for the mid-Holocene by perturbing the model initial soil moisture conditions at the beginning of spring. These experiments show that, for COSMO-CLM, there is a strong spatial dependency of European summer near-surface temperatures on the soil moisture available in spring. Remarkable differences are particularly evident over the Balkans and the area north of the Black Sea, with an increase of up to 5 • C when decreasing the initial soil moisture values by 75 %. The differences are more pronounced for the simulations with drier initial conditions, compared to the ones with enhanced soil moisture. For the latter, similar spatial patterns of colder summer temperatures are evident for all of the considered initial perturbations. Analyses of the temporal evolution of soil moisture show that adding moisture to the initial conditions leads to a faster depletion, confirming a deficiency of the considered land scheme of COSMO-CLM in properly retaining spring soil moisture, already known from present-day studies. Such deficiency, known to be one of the main reasons for a common warm bias of climate models in European summer temperatures, has never been carefully taken into account in modeling studies of the MH climate. The presented results emphasize the role of soil-atmosphere interactions as one of the possible drivers of the differences in European pollen-based summer temperatures at the MH. At the same time, they highlight the importance of properly evaluating the skills of the soil component of a given climate model in retaining spring soil moisture when investigating MH European climate. The consideration of more sophisticated soil schemes may be necessary to bridge the gap between models and proxy reconstructions, possibly contributing to the solution of this long-standing dilemma.
Finally, the analysis of the distribution of the PPEs for different variables (T2, PREC, TCLC) shows that, in almost all of the considered cases, an optimal model configuration in one period does not seem to be the best in another characterized by different radiative forcing. The ranking (based on the mean absolute error) of the single realizations changes each time. These results raise concerns about the usefulness of automatic and objective calibration methods for RCMs. Since there is no guarantee that an optimal model configuration maintains its status over different periods of time, it might make sense to better channel computational resources. An effective use of resources is of fundamental importance for the production of climate projections and should be considered as one of the main priorities of future climate studies. The presented results suggest that a better approach to the calibration of RCMs is the production of small PPEs that target a set of model configurations, properly representing climate phenomena characteristic of the target region and that will be likely to contain the best model answer under different forcing.
All the data for the mid-Holocene period with which the presented analyses are conducted are available at the following link: https://doi.org/10.5281/zenodo.5138131 (Russo, 2021c).
All the data for the pre-industrial period with which the presented analyses are conducted are available at the following link: https: //doi.org/10.5281/zenodo.5140034 (Russo, 2021d).
Additional data used for the performance of the presented simulations, such as land/surface parameters, as well as the interpolated boundaries with soil moisture conditions at 50 % saturation, are available at the following link: https://doi.org/10.5281/zenodo. 5140079 (Russo, 2021a).
The COSMO-CLM model is completely free of charge for all research applications. The version of the COSMO-CLM model used in this study can be downloaded from the following website: https:// redc.clm-community.eu/projects/cclm-sp/wiki/Downloads (last access: 1 June 2021). Access is license-restricted (http://www. cosmo-model.org/content/consortium/licencing.htm, last access: 1 June 2021), and for the download the user needs to become a member of the CLM-Community, or the respective institute needs to hold an institutional license.
Author contributions. ER designed the study and performed the simulations. ER, BF, PL, MK and CCR equally contributed to the interpretation of the results, the drafting of the manuscript and to scientific discussion.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.

Disclaimer.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements. Christoph C. Raible is supported by the Swiss National Science Foundation (SNF) within the project "Pleis-toCEP". Patrick Ludwig is supported by the Helmholtz Climate Initiative REKLIM (regional climate change; https://www.reklim. de/en, last access: 1 February 2022). Bijan Fallah is supported by the German Federal Foreign Office through the Green Central Asia project (http://greencentralasia.org/en, last access: 1 February 2022). Data are locally stored on the oschgerstore provided by the Oeschger Center for Climate Change Research (OCCR).
The computational resources necessary for conducting the experiments presented in this research were made available by the German Climate Computing Center (DKRZ). The authors are also particularly grateful to the COSMO and the CLM-Community for all their efforts in developing the COSMO-CLM model and making its code available.
Finally, a special thanks goes to the editor and the two anonymous reviewers for their constructive comments, which helped to sensibly improve the paper.
Financial support. This research has been supported by the Swiss National Science Foundation (SNF) within the project "Pleis-toCEP" (grant no. 200020_172745), the Helmholtz Climate Initiative REKLIM, and the German Federal Foreign Office through the Green Central Asia project.
Review statement. This paper was edited by Hugues Goosse and reviewed by two anonymous referees.