the Creative Commons Attribution 4.0 License.

Special issue: Paleoclimate Modelling Intercomparison Project phase 4 (PMIP4)...

**Research article**
10 Sep 2020

**Research article** | 10 Sep 2020

# A Bayesian framework for emergent constraints: case studies of climate sensitivity with PMIP

Martin Renoult James Douglas Annan Julia Catherine Hargreaves Navjit Sagoo Clare Flynn Marie-Luise Kapsch Qiang Li Gerrit Lohmann Uwe Mikolajewicz Rumi Ohgaito Xiaoxu Shi Qiong Zhang and Thorsten Mauritsen

^{1},

^{2},

^{2},

^{1},

^{1},

^{3},

^{4},

^{5},

^{3},

^{6},

^{5},

^{4},

^{1}

**Martin Renoult et al.**Martin Renoult James Douglas Annan Julia Catherine Hargreaves Navjit Sagoo Clare Flynn Marie-Luise Kapsch Qiang Li Gerrit Lohmann Uwe Mikolajewicz Rumi Ohgaito Xiaoxu Shi Qiong Zhang and Thorsten Mauritsen

^{1},

^{2},

^{2},

^{1},

^{1},

^{3},

^{4},

^{5},

^{3},

^{6},

^{5},

^{4},

^{1}

^{1}Department of Meteorology, Bolin Centre for Climate Research, Stockholm University, Stockholm, Sweden^{2}Blue Skies Research Ltd, Settle, United Kingdom^{3}Max Planck Institute for Meteorology, Hamburg, Germany^{4}Department of Physical Geography, Bolin Centre for Climate Research, Stockholm University, Stockholm, Sweden^{5}Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany^{6}Japan Agency for Marine-Earth Science and Technology, Yokohama, Japan

^{1}Department of Meteorology, Bolin Centre for Climate Research, Stockholm University, Stockholm, Sweden^{2}Blue Skies Research Ltd, Settle, United Kingdom^{3}Max Planck Institute for Meteorology, Hamburg, Germany^{4}Department of Physical Geography, Bolin Centre for Climate Research, Stockholm University, Stockholm, Sweden^{5}Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany^{6}Japan Agency for Marine-Earth Science and Technology, Yokohama, Japan

**Correspondence**: Martin Renoult (martin.renoult@misu.su.se)

**Correspondence**: Martin Renoult (martin.renoult@misu.su.se)

Received: 20 Dec 2019 – Discussion started: 17 Jan 2020 – Revised: 04 Jul 2020 – Accepted: 27 Jul 2020 – Published: 10 Sep 2020

In this paper we introduce a Bayesian framework, which is explicit about prior assumptions, for using model ensembles and observations together to constrain future climate change. The emergent constraint approach has seen broad application in recent years, including studies constraining the equilibrium climate sensitivity (ECS) using the Last Glacial Maximum (LGM) and the mid-Pliocene Warm Period (mPWP). Most of these studies were based on ordinary least squares (OLS) fits between a variable of the climate state, such as tropical temperature, and climate sensitivity. Using our Bayesian method, and considering the LGM and mPWP separately, we obtain values of ECS of 2.7 K (0.6–5.2, 5th–95th percentiles) using the PMIP2, PMIP3, and PMIP4 datasets for the LGM and 2.3 K (0.5–4.4) with the PlioMIP1 and PlioMIP2 datasets for the mPWP. Restricting the ensembles to include only the most recent version of each model, we obtain 2.7 K (0.7–5.2) using the LGM and 2.3 K (0.4–4.5) using the mPWP. An advantage of the Bayesian framework is that it is possible to combine the two periods assuming they are independent, whereby we obtain a tighter constraint of 2.5 K (0.8–4.0) using the restricted ensemble. We have explored the sensitivity to our assumptions in the method, including considering structural uncertainty, and in the choice of models, and this leads to 95 % probability of climate sensitivity mostly below 5 K and only exceeding 6 K in a single and most uncertain case assuming a large structural uncertainty. The approach is compared with other approaches based on OLS, a Kalman filter method, and an alternative Bayesian method. An interesting implication of this work is that OLS-based emergent constraints on ECS generate tighter uncertainty estimates, in particular at the lower end, an artefact due to a flatter regression line in the case of lack of correlation. Although some fundamental challenges related to the use of emergent constraints remain, this paper provides a step towards a better foundation for their potential use in future probabilistic estimations of climate sensitivity.

In recent years, researchers have identified a number of relationships between observational properties and a future climate change, which was not immediately obvious a priori but which exists across the ensemble of global climate models (GCMs) (Allen and Ingram, 2002; Hall and Qu, 2006; Boé et al., 2009; Cox et al., 2018) participating in the Climate Model Intercomparison Project (CMIP). These relationships are generally referred to as “emergent constraints” as they emerge from the ensemble behaviour as a whole rather than from explicit physical analysis.

Such emergent constraints have been broadly used to constrain properties of the Earth's climate system which are not easily or directly observable. These are usually presented in probabilistic terms, mostly based on ordinary least squares (OLS) methods. For example, studies have explored the constraint on equilibrium climate sensitivity (ECS), which is the global mean equilibrium temperature after a sustained doubling of CO_{2} over pre-industrial levels, using model outputs from the Paleoclimate Model Intercomparison Project (PMIP) (Hargreaves et al., 2012; Schmidt et al., 2014; Hopcroft and Valdes, 2015; Hargreaves and Annan, 2016). Because of their relatively strong temperature signal, paleoclimate states like the Last Glacial Maximum (LGM) and the mid-Pliocene Warm Period (mPWP) are often considered to be promising constraints for the ECS (Hargreaves et al., 2012; Hargreaves and Annan, 2016), in particular at the high end.

Almost all emergent constraint studies have used OLS-based methods to establish the link between variables in the model ensembles. However, whether ECS or another climate parameter was investigated, the theoretical foundations for the calculations have not previously been clearly presented. An additional problem arising from this is the resulting difficulty in synthesising estimates of climate system properties generated by different statistical methods with different, and often not explicitly introduced, assumptions. These methods include OLS but also alternative Bayesian approaches such as estimates of the climate sensitivity using energy balance models (Annan et al., 2011; Aldrin et al., 2012; Bodman and Jones, 2016).

Two recent papers have also addressed the question of emergent constraints in different ways. Bowman et al. (2018) presented a hierarchical statistical framework which went a long way to closing the gap in theoretical understanding of emergent constraints. Conceptually, it is very similar to a single-step Kalman filter, with which the iteration process is avoided to only keep a single updating of a prior into a posterior. Specifically, it uses the model distribution approximated as a Gaussian as a prior, which is then updated using the observation to a posterior. However, such a prior and the underlying assumptions attached to it could be seen as a restrictive choice to impose on the climate sensitivity area of research. In particular, most of the posterior values would lie in the range covered by the ensemble of models if the observed value is either uncertain and/or close to the prior mean. This is a direct consequence of the joint probability distribution produced by the Kalman filter, which in the case of joint Gaussian distributions will produce a tighter posterior Gaussian distribution. Because of that, it does not appear to correspond to the choice which is usually made, albeit implicitly.

Another Bayesian statistical interpretation of emergent constraints has recently been presented by Williamson and Sansom (2019), who extended the standard approach to account for more general sources of uncertainty including model inadequacy. A key aspect of their approach is that they set a prior on the observational constraint rather than the climate system parameter(s) that we are primarily interested in for this study, i.e. the climate sensitivity. Thus, their prior predictive distribution for the climate system parameter is not immediately clear and may not be so easily specified as in the approach we explore here.

We present an alternative Bayesian linear regression approach in which the regression relationship is used as a likelihood model for the problem. This allows the prior over the predictand to be defined separately from and entirely independently of the model ensemble and emergent constraint analysis. Thus, the likelihood arising from the emergent constraint could be used to update a prior estimate of the predictand that arose from a different source.

In Sect. 2 we provide an overview of the concept of emergent constraints, the previous methods used for these analyses, the Bayesian framework, and the models and data employed in the paper. Section 3 describes the results, starting with analysis of models and data from the Paleoclimate Intercomparison Project (PMIP) phases 2 and 3 for the LGM and mPWP, which have previously been analysed for an emergent constraint on climate sensitivity (Schmidt et al., 2014; Hargreaves and Annan, 2016). We then incorporate some CMIP6 and PMIP4 model outputs that have been made available to us for these periods to illustrate how these outputs fit into the same analysis. We also use the LGM and mPWP outputs to demonstrate how the method allows independent emergent constraints to be combined. Finally, we discuss the influences of the prior and model inadequacy on climate sensitivity in Sect. 3.5 and 3.6, respectively.

The general method of emergent constraints seeks a physically plausible relationship in the climate system between two model variables in an ensemble of results from different climate models. Consequently, an observation of one measurable variable (such as past tropical temperatures) could be used to better constrain the other investigated variable, usually unobserved and difficult to measure (such as climate sensitivity). This idea has been used in climate science to estimate quantities of interest such as snow albedo feedback (Hall and Qu, 2006), future sea ice extent (Boé et al., 2009; Notz, 2015), low-level cloud feedback (Brient et al., 2016), and the equilibrium climate sensitivity (Hargreaves et al., 2012; Schmidt et al., 2014; Cox et al., 2018). Although the unobserved variable is usually taken as a future variable, the emergent constraints theory can be used with two variables within the same timeframe, as long as the relationship is plausible. A summary of several different emergent constraints on climate sensitivity was made by Caldwell et al. (2018). This approach using emergent constraints is meaningful only if we believe that reality satisfies the same relationship and it was not observed purely by chance in the model ensemble. There is a risk in searching for such relationships in a small ensemble that we may find examples which are coincidental, with no real predictive value (Caldwell et al., 2014). Spurious relationships could also be found because of model limitations (Fasullo and Trenberth, 2012; Grise et al., 2015; Notz, 2015).

In this study, we focus on the relationship between equilibrium climate sensitivity, defined here as *S*, and the temperature change in the tropics which is observed at the Last Glacial Maximum (LGM) and the mid-Pliocene Warm Period (mPWP), defined as *T*_{tropical}. We posit that a relationship between climate sensitivity and temperature change is physically plausible, as we expect the long-term quasi-equilibrium temperature to be mainly influenced by radiative forcing, and in many model ensembles, variations in climate sensitivity have been dominated by tropical feedbacks, mostly arising from low-level clouds (Bony et al., 2006; Vial et al., 2013).

## 2.1 Ordinary least squares

The most widely used approach to emergent constraint analysis is to find an observable phenomenon that exhibits some relationship to the parameter of interest and use this as a predictor in a linear regression framework. The ordinary least squares (OLS) method has been widely used because of its simplicity, so we also use it here as a starting point for comparison with alternative statistical methods. In the context of constraining climate sensitivity, the parameter of interest (i.e. the ECS) is considered to be a predicted variable (Hargreaves et al., 2012; Schmidt et al., 2014; Hargreaves and Annan, 2016). This may be written as

where *S* is the climate sensitivity, *γ* and *δ* two unknown parameters, *T*_{tropical} the temperature anomaly averaged over the tropical region for the given paleo-time interval, and *ζ* the residual term which is drawn from a Gaussian distribution *N*(0,*σ*^{2}) and which accounts for deviations from the linear fit. When we use this approach, the unknown constants of the linear fit are estimated via ordinary least squares (OLS) using the $({T}_{\mathrm{tropical}}^{i},{S}^{i})$ pairs representing the model ensemble (here *i* indexes the models), and then the equation is used to predict the true value of *S* for the climate system based on the observed value ${T}_{\mathrm{tropical}}^{\mathrm{o}}$. A confidence interval for the predictor variable can be generated by accounting for uncertainties in the fit and in the observed value through a simulation of an ensemble of prediction as demonstrated by Hargreaves et al. (2012). This procedure makes the assumption that reality satisfies the same regression relationship as the models, i.e. is likely to be at a similar distance from the line as the model points are.

Integrating the intrinsically frequentist confidence intervals obtained from regression methods used for OLS estimates into a Bayesian framework is challenging. One issue is the misinterpretation of frequentist confidence intervals as Bayesian posterior credible intervals. The former is the representation of the number of random intervals containing the true interval bounds (at 90 % confidence, this would lead to 90 out of 100 random intervals containing the true bounds), while the Bayesian credible interval is an interval which we believe (with the given probability) to contain the truth. For instance, if there is an observed *T*_{tropical}=1 K, with an assumed Gaussian observational uncertainty of *σ*=0.25 at 1 standard deviation, then stating that there is a close-to-95 % probability of having the true value of the parameter within the interval 0.5–1.5 K is a Bayesian credible interval interpretation. However, the latter is a common interpretation of frequentist-based studies. This confusion has inherent drawbacks for the analysis of posterior outputs, as shown in various fields of science (Hoekstra et al., 2014) and more recently for climate sensitivity computations (Annan and Hargreaves, 2020).
Williamson and Sansom (2019) have presented a Bayesian interpretation of this approach using reference priors on *ψ*, as defined by Cox et al. (2018), as a metric of global mean temperature variability and the regression coefficients. However, this approach does not appear to readily allow for the use of any arbitrary prior distribution for *S*, which may either be desired for comparison with other research or have arisen through a previous unrelated analysis. The Bayesian linear regression approach that we introduce in the next section avoids these problems.

## 2.2 Bayesian framework

The (subjective) Bayesian paradigm is based on the premise that we use probability distributions to describe our uncertain beliefs concerning unknown parameters. We use Bayes' theorem to update a prior probability distribution function (PDF) for the equilibrium climate sensitivity via

where $P\left(S|{T}_{\mathrm{tropical}}^{\mathrm{o}}\right)$ is the posterior estimate of *S* after conditioning on the geological proxy data ${T}_{\mathrm{tropical}}^{\mathrm{o}}$, *P*(*S*) is the prior, and $P\left({T}_{\mathrm{tropical}}^{\mathrm{o}}\right)$ is a normalisation constant. The likelihood $P\left({T}_{\mathrm{tropical}}^{\mathrm{o}}|S\right)$ is a function that takes any value of *S* and generates a probabilistic prediction of what we would expect to observe as ${T}_{\mathrm{tropical}}^{\mathrm{o}}$ if that value was correct. The use of the Bayesian paradigm requires us to create such a function. Using the principles of emergent constraint analyses in which a linear relationship between these two parameters, which was seen in the GCM ensemble, is believed to also apply to reality, it is natural to use the regression relationship

where *α*, *β*, and *σ*, the standard deviation of *ϵ* as $\mathit{\u03f5}\sim N(\mathrm{0},{\mathit{\sigma}}^{\mathrm{2}})$, are three a priori unknown parameters.
Note that this reverses the roles of predictor and predictand compared to the OLS-based approach (Eq. 1). It implies that *S* is able to give a prediction of *T*_{tropical} with a given uncertainty. This is physically plausible, as *S* is considered one of the best metrics to represent temperature change. In particular, *S* is often diagnosed in climate models from an abrupt and sustained quadrupling of CO_{2} from pre-industrial conditions (4xCO_{2}), which usually leads to weak non-linearity similar to what is observed from LGM or mPWP climate dynamics. Therefore, it is possible to use the 4xCO_{2}-computed *S* of climate models to predict *T*_{tropical}, assuming *ϵ* as a representation of all processes not related to *S*.

Choosing *S* as the predictor (Eq. 3) will cause some differences to the inference of the posterior *S* compared to the OLS-based approach introduced
in Eq. (1). The plausibility of the existence of an emergent constraint between *S* and *T*_{tropical} is independent of the method chosen. Whether *T*_{tropical} is a predicted or predictor variable, or whether the applied
method uses OLS or Bayesian statistics, the methods estimate different unknown parameters to investigate a similar assumed relationship within the model ensemble, so it is expected that these different methods will yield similar but not identical results. This was previously argued in the context of a hierarchical statistical model for emergent constraints by Tingley et al. (2012). The Bayesian approach with *S* as the predictor is appropriate for emergent constraint analyses thanks to its transparency and handling of uncertainties. This has been explored by Sherwood et al. (2020) and is also investigated in this study. Thus, here we explore the implementation of the Bayesian method for emergent
constraint analyses for models and data that have already been investigated with alternative methods (Hargreaves et al., 2012; Hargreaves and Annan, 2016).

The three parameters *α*, *β*, and *σ* in Eq. (3) are conditioned on the model ensemble defined by its pairs of (${T}_{\mathrm{tropical}}^{i},{S}^{i}$) (with *i* indexing the models). We estimate them via a Bayesian linear regression (BLR) procedure, which requires priors to be defined over these parameters. Consequently, the likelihood *P*(*T*_{tropical}|*S*) for a given *S* (as required by Eq. 2) is an integration over the posterior distribution of *T*_{tropical} predicted by the regression relation (convolved with observational uncertainty where appropriate) and conditioned on the model ensemble through *α*, *β*, and *σ*.

In this way, we create a statistical model that can generate a predictive PDF for the tropical temperature change at the LGM or at the mPWP *P*(*T*_{tropical}|*S*) for any given sensitivity. There is a structural difference between this approach and that of Eq. (1) in that here the residual uncertainties $\mathit{\u03f5}\sim N(\mathrm{0},{\mathit{\sigma}}^{\mathrm{2}})$ represent our inability to perfectly predict the tropical temperature anomaly arising from a given sensitivity and are probabilistically independent of the latter rather than the former variable. The issue here is not a matter of which regression line is “correct”, but rather how, given the model ensemble, we can create a plausible likelihood model for *P*(*T*_{tropical}|*S*).

It is important to note that Eq. (3) and the conditioning of parameters on the model ensemble only relate to the generation of the likelihood. The emergent constraint calculation itself is then a second step that uses this likelihood to calculate the posterior of interest $P\left(S|{T}_{\mathrm{tropical}}^{\mathrm{o}}\right)$ (Eq. 2). To apply the emergent constraints theory, it is required to insert a geological observation ${T}_{\mathrm{tropical}}^{\mathrm{o}}$ estimated through proxy data and obtain the likelihood $P\left({T}_{\mathrm{tropical}}^{\mathrm{o}}|S\right)$, which leads to the posterior $P\left(S|{T}_{\mathrm{tropical}}^{\mathrm{o}}\right)$ by Bayesian updating. We perform this step through a simple importance sampling algorithm by approximating $P\left({T}_{\mathrm{tropical}}={T}_{\mathrm{tropical}}^{\mathrm{o}}|S\right)$. That is, for any given sensitivity *S*, we can calculate the probability of the observation of tropical temperature that we have as the composition of the predictive PDF for actual tropical temperature, together with the uncertainty associated with the observation itself. The emergent constraint theory is thus applied with a two-stage Bayesian process, including in first stage the BLR and in the second stage a Bayesian updating.

A prior belief on climate sensitivity (*P*(*S*)) in the Bayesian updating process, and on the parameters of the regression model in the BLR process, has to be assumed. There is no clearly uncontested choice for prior distribution for climate sensitivity. However, Annan and Hargreaves (2011) argued that a Cauchy distribution has a reasonable behaviour with a long tail to high values but, unlike the uniform prior, does not assign high probability to these values. Thus, we adopt this prior for our main analyses. In Sect. 3.5 we test the sensitivity of the results to this choice and compare the results obtained using gamma and uniform prior distributions. Priors for the parameters of the regression model are chosen with reference to the specific experiment and are intended to represent our reasonable expectation that models do indeed generate a regression relationship as described.

An additional issue that was briefly mentioned above is that we may like to consider the probability that reality is qualitatively and quantitatively distinguishable from all models. This issue, which was explicitly argued in the context of emergent constraint analysis by Williamson and Sansom (2019), seems reasonable since all models do share a theoretical heritage and certain limitations. However, this issue remains challenging to quantify. It has not been considered in most previous studies, which also makes it difficult to compare. We investigate this issue in Sect. 3.6. Whilst the proposed resolution remains preliminary and although the concept is promising for understanding emergent constraints, we decide to omit it for the bulk of our analysis to enable more direct comparisons with previous studies.

The Bayesian method is more explicit than the standard OLS approach, as the prior assumptions have to be given by the user. This transparency leads to more freedom and control of the statistical model. Moreover, it has a reduced sensitivity to outliers as the prior on the regression coefficients provides a form of regularisation. This should lead to lower variance in the results compared to results with wider priors on the parameters, particularly with small model ensembles.

Additionally, the Bayesian method allows the user to add multiple lines of evidence by updating the chosen prior for *S*. The method for combining independent constraints is reasonably simple, as it only requires us to calculate and store the posterior of the first emergent constraint analysed and use this distribution as the prior for the second emergent constraint.
Thus, it is a direct form of sequential Bayesian updating. This process results in a posterior distribution which will generally be narrower than either of the two posteriors that would have been generated from either of the emergent constraints separately.
Although it may be tempting to simply combine all emergent constraints in this way, it is necessary to also consider possible dependencies between the uncertainties in the different emergent constraints before this can be done with confidence (Annan and Hargreaves, 2017).

It is not clear if observational errors have always been adequately accounted for in previous emergent constraints research. Our approach provides a natural framework for this, as the likelihood can include the uncertainty of the observational process as we have done. Recent studies have investigated Bayesian ways of integrating uncertainties on proxy reconstructions into the global temperature field (e.g. Tierney et al., 2019). For the sake of comparison with Hargreaves et al. (2012), Schmidt et al. (2014), and Hargreaves and Annan (2016), we use the reconstructions and observational errors adopted in these studies, which are based on multiple linear regressions and model–proxy cross-validation.
However, we have ignored uncertainties in the calculation of the model values of *S* and *T*_{tropical} as, while they are poorly quantified, we believe them to be too small to materially affect our result. In fact, it has been argued for the case of the mPWP that observational errors on *S* and *T*_{tropical} are small compared to the structural differences responsible of the dispersion of the points around the regression line and can thus be neglected (Hargreaves and Annan, 2016).

## 2.3 Kalman filter

Bowman et al. (2018) recently presented a new interpretation of emergent constraint analysis. Their framework is essentially a two-dimensional ensemble Kalman filtering approach in which the prior, represented by the model ensemble, is updated according to the observation using the Kalman equations, which approximate all distributions by a multivariate Gaussian (Kalman, 1960). The Kalman equations are given by

where ** x** is the mean,

**P**its covariance,

**the observations with an associated observational uncertainty covariance matrix**

*z***R**, and

**H**the operator that maps the model state onto observations. The superscripts f and a by convention refer to the forecast (i.e. the prior in this work) and analysis (the posterior), respectively. While in many applications, such as numerical weather prediction, this method is applied in an iterative fashion with the analysis being used as the starting point of the next forecast, here it is only applied once as a way of implementing Bayesian updating to our prior in order to generate the posterior.

Here we only have two dimensions for the Gaussian, these being the scalar predictor (e.g. sensitivity) and predictand (e.g. tropical temperature change). While this approach is a natural and attractive option in many respects, it has the specific drawback (in the context of this work) of using the distribution of model samples as a prior (for both the mean and covariance). Existing literature on emergent constraints does not make this assumption, and this could be seen as a limiting aspect of the method, as it implies that the model ensemble is already a credible predictor even before consideration of the observational constraint. An implication of this approach is that the posterior estimate will be equal to the model distribution in the case that no observational constraint exists, either because there is in fact no relationship between the observation and predictand or when the observational uncertainty is excessively large. The use of a Gaussian prior based on the ensemble range also means that it is difficult for the method to generate posterior estimates that include values significantly outside the model range, even in the case in which the observed value is outside the model spread. We present results generated with a Kalman filter in Sect. 3.1 for comparison with our main analysis.

## 2.4 Climate models and data

The Bayesian method may be applied to any emergent constraint. In this study, we use the model outputs and data syntheses that have arisen from phases 2 and 3 of PMIP (Braconnot et al., 2007; Haywood et al., 2011; Harrison et al., 2014), as well as the few available models of phase 4 (Haywood et al., 2016; Kageyama et al., 2017), summarised in Table 1. The Last Glacial Maximum (19–23 ka) corresponds to the period of the last ice age when ice sheets and sea ice had their maximum extent. Due to its temporal proximity, relative abundance of proxy data, and substantial radiative forcing anomaly, the LGM is widely considered one of the best paleoclimate intervals for testing global climate models and has been featured in all of the PMIP consortium experiments. A representation of several model LGM simulations compared to the surface air temperature (SAT) reconstruction of Annan and Hargreaves (2013) is shown in Fig. 1a.

K-1 Model Developers (2004)Randall et al. (2007)Randall et al. (2007)Randall et al. (2007)Randall et al. (2007)Randall et al. (2007)Goosse et al. (2005)Andrews et al. (2012)Andrews et al. (2012)Sueyoshi et al. (2013)Andrews et al. (2012)Andrews et al. (2012)Andrews et al. (2012)Mauritsen et al. (2019)Hajima et al. (2020)Ohgaito et al. (2020)Haywood et al. (2013)Haywood et al. (2013)Haywood et al. (2013)Haywood et al. (2013)Haywood et al. (2013)Haywood et al. (2013)Randall et al. (2007)Haywood et al. (2013)Guo et al. (2019)Gettelman et al. (2019)Wyser et al. (2020)^{a} For the LGM simulations (generations PMIP2, PMIP3, and PMIP4), the tropical average was defined between 20^{∘} S and 30^{∘} N (Hargreaves et al., 2012). For the mPWP simulations (generations PlioMIP1 and PlioMIP2), the tropical average was defined between 30^{∘} S and 30^{∘} N (Hargreaves and Annan, 2016). All temperature values are defined as changes compared to pre-industrial values.
^{b} Latest version of a model that was kept for the approach described in Sect. 3.3.
^{c} Calculated using the Gregory method on 150 years of output, making it consistent with the values of Andrews et al. (2012).

Previous results from PMIP2 showed a significant correlation between LGM tropical temperatures and climate sensitivity in the models (Hargreaves et al., 2012), although the equivalent calculation for the PMIP3 models found no significant correlation (Schmidt et al., 2014; Hopcroft and Valdes, 2015). These two similar-sized ensembles with contrasting characteristics are a good test bed for exploring the properties of the different methods. For the tropical temperature anomaly relative to the pre-industrial value we use a value from Annan and Hargreaves (2013): for 20^{∘} S to 30^{∘} N a ${T}_{\mathrm{tropical}}^{\mathrm{o}}$ of −2.2 K with a Gaussian observational uncertainty of ±0.7 K (5 %–95 % confidence interval). Several data compilations are presently in development as part of PMIP4, but these have yet to be integrated into a global temperature field, so revising the temperature estimate from Annan and Hargreaves (2013) is a topic for future work.

Interest in the mPWP (2.97–3.29 million years ago) as a more direct analogy for future climate change has grown during the past years. This is the most recent period with a sustained high level of greenhouse gases and concomitant warmth relative to the pre-industrial period; however, the data are more sparse and uncertain. In Fig. 1b, the sea-surface temperature (SST) anomaly of different climate models which performed a mPWP simulation is displayed, as is the PRISM3 SST reconstruction (Dowsett et al., 2009). Previous results for this period from the Pliocene Model Intercomparison Project (PlioMIP) experiment, which was part of PMIP3, indicated a fairly strong correlation between tropical temperature and climate sensitivity in the models, but the confidence with which this can be used to constrain climate sensitivity was low due to high uncertainty in various observationally derived components and various compromises in the way the protocol was formulated (Hargreaves and Annan, 2016). For the mPWP, a tropical temperature anomaly of 0.8±1.6 K (5 %–95 % interval) is taken from Hargreaves and Annan (2016) for 30^{∘} S to 30^{∘} N, assuming the largest 5 %–95 % uncertainty shown in that work. The reconstruction used here is the PRISM3 (Pliocene Research, Interpretation and Synoptic Mapping) SST anomaly field as described in Dowsett et al. (2009).

The Last Interglacial (127 ka, referred to as lig127k in CMIP6) and the mid-Holocene (6 ka) are part of the PMIP simulations and also relatively warm climates. The forcings are, however, seasonal and regional in nature, mostly influencing the patterns of climate change. The global change in temperature and the global climate forcing are both very small, and this coupled with the large uncertainty in paleoclimate data makes these intervals poor candidates for constraining climate sensitivity. We do not explore these intervals further here.

Climate sensitivity has various definitions and there are also a number of different ways of approximating the value in climate models that have not been run to equilibrium. For PMIP3 LGM the model values are mostly based on the regression method of Gregory et al. (2004), but for the models which contributed to PMIP2 LGM and PlioMIP the exact definition and derivation used in each case are not always clear in the literature. In order to make comparisons with previous work, here we use the same values as those used in Hargreaves et al. (2012), Schmidt et al. (2014), and Hargreaves and Annan (2016) with two exceptions to ensure that only one value of sensitivity is used for identical versions of the same model across different experiments. Specifically, for FGOALS-g2 we use the value of 3.37 K (Masa Yoshimori, personal communication, 2013) for both PMIP3 LGM and PMIP3 PlioMIP, and for HadCM3 we use 3.3 K (Randall et al., 2007) for both PMIP2 LGM and PMIP3 PlioMIP. Previous values used by Hargreaves and Annan (2016) for PMIP3 PlioMIP were 3.7 K for FGOALS-g2 (Zheng et al., 2013) and 3.1 K for HadCM3 (Haywood et al., 2013). These changes are minor compared to the ensemble range of climate sensitivity, and thus they have no significant effect on the posterior outputs.

In addition to the already published results from PMIP2 and PMIP3 we add to our ensembles the results that are currently available from PMIP4 in Sect. 3.3. While the LGM protocol (Kageyama et al., 2017) remains very similar to that in previous iterations of PMIP, the mPWP protocol (Haywood et al., 2016) has more significant differences which address several of the limitations of the previous version. Most importantly, PlioMIP2 seeks to represent a specific quasi-equilibrium climate state in the past rather than representing an amalgamation of different warm peak climates as had been the case for PlioMIP1. A priori we are therefore less confident about combining the results from PlioMIP1 and PlioMIP2 and do so mostly to indicate where the new models lie in the ensemble and to highlight the potential for future research in this area once more model results based on the PlioMIP2 protocol become available.

In order to apply the Bayesian linear regression and compute the likelihood *P*(*T*_{tropical}|*S*), several priors have to be established as initial conditions. Specifically, for both the LGM and the mPWP we use Eq. (3) as the basis for our likelihood function. The prior expectations of the three unknown parameters *α*, *β*, and the standard deviation of the residual *ϵ*, referred to as *σ*, need to be defined. The relative complexity of the likelihood function with three a priori unknown parameters requires the use of a sampling method for computational efficiency. In this study, we use the Markov chain Monte Carlo (MCMC) method NUTS as described by Hoffman and Gelman (2014). The NUTS method is also included in the MCMC Python package PyMC3 (Salvatier et al., 2016), which is applied here. The approach is alternatively described as a conjugate prior problem using the R package spBayes (Finley et al., 2013, 2014), described in Appendix A, and leads to similar results.

Depending on the strength of the correlation among the dataset, one could expect a sensitivity of the regression to the choice of prior parameters. In the following sections, we first describe the physical arguments behind the choice of priors over *α*, *β*, and *σ* and then present the outputs of the BLR for both the PMIP2 and PMIP3 dataset of the LGM and the PlioMIP1 dataset of the mPWP. Then, we include the CMIP6 data in the Bayesian framework for both paleo-intervals and present an approach of combining the two emergent constraints. Finally, we explore the sensitivity of the Bayesian approach to the choice of priors over the climate parameter of choice (i.e. the climate sensitivity) and to the hypothetical inadequacy of climate models.

## 3.1 The Last Glacial Maximum

From consideration of energy balance arguments and fundamental physical properties, such as the response of the Earth to an increase in CO_{2}, we have a prior expectation of a relationship between sensitivity and the global LGM temperature anomaly (e.g. Lorius et al., 1990), and the model experiments of Hargreaves et al. (2007) and simple physical arguments about the spatial distribution of forcing suggest that this relationship may be most clearly visible when we focus on the tropical region. While the total negative forcing at the LGM is roughly twice as large as the positive forcing that would be caused by a doubling of CO_{2}, the temperature response at low latitudes is generally expected to be lower than the global mean due to polar amplification and the related presence of high-latitude ice sheets. Thus, we might reasonably expect the tropical temperature change at the LGM to be roughly equal to the global temperature rise under a doubling of CO_{2}. It would also be unexpected if the correlation had the opposite sign to that based on simple energy balance arguments such that a more sensitive model had a lower temperature change at the LGM. However, we cannot justify imposing a precise constraint on the slope and therefore our choice of prior for *α* is $N(-\mathrm{1},{\mathrm{1}}^{\mathrm{2}})$. As for *β*, we expect the regression line to pass close to the origin, as a model with no sensitivity to CO_{2} would probably have little response to any other forcing changes, especially in the tropical region where the influence of ice sheets is remote. However, we do not expect a precise fit to the origin, and therefore the prior chosen for *β* is *N*(0,1^{2}). Finally, we chose a wide prior for *σ*, a half-Cauchy with a scale parameter of 5. The Cauchy is fairly close to uniform for values smaller than the scale parameter, decaying gradually for higher values.

^{∗} BF: Bayesian framework. OLS: predictive range via ordinary least squares. Truncated-at-zero Cauchy prior: peak: 2.5, scale: 3. Gamma prior: peak: 2, scale: 2. Uniform prior: bounded 0–10. The “latest” model ensembles are those created from the most recent versions of each model (see Sect. 3.3).

Deviations from the regression line may be due to different efficacies of other forcing components, especially ice sheets or dust. To take into account the uncertainty on the strength of the response, we performed two additional analyses wherein the prior response was smaller (*α* defined as $N(-\mathrm{0.5},{\mathrm{1}}^{\mathrm{2}})$) and larger (*α* defined as $N(-\mathrm{2},{\mathrm{1}}^{\mathrm{2}})$). We do not see much difference in the results using the three priors over *α*: the difference is approximately 0.2 K of climate sensitivity for both the upper and lower percentiles quoted, giving us confidence in our choice of $N(-\mathrm{1},{\mathrm{1}}^{\mathrm{2}})$. The computed 5 %–95 % posterior climate sensitivity ranges for different values of *α* are summarised in Table 2.

The MCMC algorithm samples the posterior distribution of regression parameters, which is represented by the ensemble of predictive regression lines in Fig. 2. This ensemble is used to infer the climate sensitivity following the Bayesian inference approach using the geological reconstruction of the LGM tropical temperature. The posterior distributions of *S* are computed using a truncated-at-zero Cauchy prior with a peak of 2.5 and a scale of 3, which corresponds to a wide 5 %–95 % prior interval of 0.5–28.7 K. Such a prior was used previously by Annan et al. (2011) because it has a long tail, allowing for a substantial probability of having high climate sensitivity while still maintaining some preference for more moderate values. However, the sensitivity of Bayesian statistics to the choice of prior has often been noted. Thus, two alternative priors, including the widely used uniform prior, and their corresponding posterior distributions are investigated in Sect. 3.5.

To test the robustness of the method and also to compare it with the statistical methods used in previous studies, three cases are investigated in which we use different combinations of the available model ensembles. The results are shown in Fig. 2 and Table 2.

For the PMIP2 ensemble, the correlation between tropical temperature and climate sensitivity was found to be reasonably strong, and in this study the resulting 5 %–95 % range for inferred climate sensitivity is 1.0–4.5 K (Fig. 2b). The range is slightly better constrained at the lower end than the 0.5–4 K from Hargreaves et al. (2012); however, we have used the revised value for the LGM tropical anomaly of $-\mathrm{2.2}\pm \mathrm{0.7}$ K rather than the value of $-\mathrm{1.8}\pm \mathrm{0.7}$ K that was used by Hargreaves et al. (2012). The Bayesian-inferred value is similar to the OLS-inferred method with the revised version (Table 2), giving confidence in the proximity of both methods in the case of high correlation.

When all the models of PMIP2 and PMIP3 (see Table 1) were considered jointly the correlation became weaker and the corresponding 5 %–95 % range generated by the Bayesian method is 0.7–4.8 K (Fig. 2d). Schmidt et al. (2014) obtained 1.6–4.5 K using a similar ensemble although in that case multiple results obtained from the same modelling centre were combined by averaging. Using the OLS method on our ensemble and generating predicted values, we obtain a 5 %–95 % range of 1.4–4.6 K. The Bayesian method generates a wider range here, particularly at the lower end, as the correlation is weaker and the prior starts to influence the posterior.

Finally, we consider the PMIP3 models in isolation. For this ensemble no correlation is found, so for the Bayesian method the result is heavily dependent on our prior assumptions. We obtain a 5 %–95 % range here of 0.7–5.5 K (Fig. 2f). Applying the OLS-based prediction method to the PMIP3 dataset gives a 5 %–95 % range of 1.3–5.6 K. As previously argued for the combination of PMIP2 and PMIP3, the latter method produces a tighter posterior range at the lower end. In the absence of a correlation, the Bayesian method relaxes to the prior, whereas the predictions obtained via the OLS method are heavily influenced by the range of the ensemble. Additionally, as previously argued in Sect. 2.2, the differences in the posterior 5 %–95 % range between the Bayesian and OLS-based approaches are partly connected to choosing *S* as a predictor or predicted, respectively. The impact of such choice will be even bigger as the correlation gets weaker, since the difference between the respective error parameters *ϵ* and *ζ* will increase. However, we emphasise that this does not suggest that either range is closer to reality. Although the comparison between methods with a predictor or predicted *S* should get more complex from a philosophical point of view as *ϵ* differs from *ζ*, we stipulate that both ranges can be considered valuable information regarding *S* within a climate and emergent constraint framework.

The Kalman filtering approach presented by Bowman et al. (2018) has not previously been used for emergent constraint analyses in paleoclimate research. Thus, we also use this method to explore both PMIP2 and the combination of PMIP2 and PMIP3 (Fig. 3). With the same geological reconstruction value and a prior 5 %–95 % range (based on the PMIP2 GCM ensemble) of 1.7–4.5 K, a posterior range of *S* of 1.8–4.1 K is inferred. By combining the PMIP2 and PMIP3 models, the prior 5 %–95 % range becomes 2.0–4.5 K and the posterior range is 2.2–4.2 K. The increase in the lower bound in these calculations is the largest change compared to our Bayesian linear regression method. However, this is strongly forced by the underlying assumptions of a Kalman filter (Sect. 2.3), which uses the model ensemble as a prior, making it difficult to compute a posterior range outside the model range, in particular when the observed value is considered excessively uncertain. Thus, although the Kalman filtering method could be interesting, we do not consider it further, as we stipulate that its assumptions are too restrictive for the question of emergent constraints, and it therefore cannot be a relevant method in its current form to efficiently assess *S* and, in particular, its uncertainty.

## 3.2 The mid-Pliocene Warm Period

As for the LGM, prior parameters have to be defined to perform the BLR with the mPWP data. In principle these may be different to those used for the LGM experiment, since the total positive forcing of the mPWP is not as large as the negative forcing of the LGM, but in practice we have adopted the same priors for our base case, apart from the obvious sign change for *α*. We performed the same sensitivity experiments as for the LGM, with three different priors over *α*: *N*(1,1^{2}), *N*(0.5,1^{2}), and *N*(2,1^{2}). There was only a small difference between the results using the three priors: the differences at the 5th percentile were less than 0.1 K, and the differences at the 95th percentile were approximately 0.3 K (see Table 2). Regarding *β* and *σ*, there is no physical reason for their priors to be substantially different than the ones chosen for the LGM. Thus, a *N*(0,1^{2}) prior for *β* is selected, and the same prior for *σ* as for the LGM analysis is chosen.

The Bayesian inference method applied above for the LGM model outputs is now applied to the mPWP model outputs (Fig. 4). With less abundant models and less well-constrained temperature data, we prefer to assume large uncertainties in the mPWP SST reconstruction (0.8±1.6 K, 5 %–95 % confidence). We adopt the Cauchy prior on climate sensitivity as for the LGM analysis (5 %–95 % interval of 0.5–28.7 K) and compute a 5 %–95 % interval for the ECS of 0.5–5.0 K for the PlioMIP1 dataset. Similar to the results for the LGM, the predictions via the OLS method (Hargreaves and Annan, 2016) resulted in a slightly narrower 5 %–95 % range than the Bayesian method (1.3–4.2 K, assuming 1.6 K of uncertainty on the data).

## 3.3 Inclusion of CMIP6 and PMIP4 data

The ongoing PMIP4 experiments have produced LGM and mPWP (PlioMIP2) simulations. Here we add those results to our ensembles. There are four model runs available for the LGM and five for the mPWP (see Table 2) on 1 May 2020.

For the LGM we have previously combined the PMIP2 and PMIP3 results, and the protocol for PMIP4 is not very different. If we combine all three ensembles we obtain a 5 %–95 % range for the ECS of 0.6–5.2 K using the Bayesian method (Fig. 5b). The ensemble size is now 18, but we note that this includes several models coming from the same modelling centres. Past studies have investigated the proximity of models with hierarchical trees (Masson and Knutti, 2011; Knutti et al., 2013) and the influence of their dependency on statistical methods (Annan and Hargreaves, 2017). Thus, although we believe such dependencies exist in the ensemble, it is in reality difficult to quantify and correct for this. How to deal with this possible duplication of information is therefore a subjective decision. In Schmidt et al. (2014) it was taken into account by averaging the results from models from the same modelling centre. Here we take an alternative approach of including only the latest version of each model. This gives an ensemble size of 11 models (Table 2) and a 5 %–95 % climate sensitivity range of 0.7–5.2 K with the Bayesian method. The range here is relatively wide and close to the range computed with the ensemble of PMIP2, PMIP3, and PMIP4. This is due to the removal of almost all PMIP2 models in this restricted ensemble, which leaves mainly the poorly correlated PMIP3 ensemble and the ensemble of PMIP4 together.

For PlioMIP1 and PlioMIP2 the situation is a little more complex as the protocol has been redesigned to represent a specific interglacial state rather than a generic warm climate, referred to as a “time slab” in the PlioMIP protocol. Thus, there could be a different regression relationship for these two ensembles. However, when we plot the PlioMIP1 ensemble members (Fig. 5d) we see that they do not look different to the PlioMIP2 ensemble members. The straight combination of PlioMIP1 and PlioMIP2 gives an ensemble range of 14 models and we computed a 5 %–95 % range of 0.5–4.4 K. Including only the most recent versions of models results in an ensemble size of 11 models (Table 2) and generates a nearly identical 5 %–95 % climate sensitivity range of 0.4–4.5 K with the mPWP simulation. Thus, for this period the inclusion of the PlioMIP2 models allows for a tighter constraint at the upper bound, much aided by the larger spread of *S* in these new models.

## 3.4 Combining multiple constraints

As described in Sect. 2.4, the mPWP and the LGM are very different climates. If the observational data are generated by unrelated analyses, we may be able to consider the two lines of evidence to be independent and combine them using Bayes' theorem to create a new posterior which is likely to be narrower than that arising from either analysis alone. Assuming that the uncertainties arising from the mPWP and the LGM analyses are independent of each other may be plausible as the proxy reconstructions use different observations and analyses to estimate both the tropical temperatures and the other variables that act as boundary conditions for the model experiments. Moreover, modelling uncertainties that influence the regression analysis are expected to arise from rather different sources, such as the response to ice sheets and a cold climate in one case versus the influence of a warmer climate in the other. Having said that, model biases influencing the simulation of one climate change may also influence the other, which means that if similar models occur in both ensembles, this could lead to dependencies. Using Bayes' theorem to combine the constraints means that it is not necessary for the same set of models to be used for each ensemble, but, as we can see from Table 1, a few models do occur in both ensembles.

It is straightforward to first compute the posterior estimate of *S* from the LGM analysis as previously described and then use this as a prior for the mPWP analysis. Priors over the regression coefficients are considered independent between the two analyses. Because of the issues discussed above, we perform an analysis using both ensembles of latest model versions in the LGM and the mPWP as described in Sect. 3.3. The posterior of the LGM is used as the prior for the mPWP analysis, and the resulting posterior from this process has a narrower 5 %–95 % interval for *S* of 0.8–4.0 K (Fig. 6).

A logical extension of the approach would be to apply it to the ensemble of models in CMIP, wherein multiple emergent constraints exist for the same models. In theory, this should be possible as long as the investigated relationships are physically plausible. This goes beyond the scope of our study, which uses the paleoclimates as an example for the method, and is left for future research.

## 3.5 Alternative priors on sensitivity

A major strength of the Bayesian analysis developed here is the way that the prior on the parameter of interest, here climate sensitivity, can easily be specified independently of all other aspects of the analysis.
A uniform prior for *S* has been widely used (e.g. Tomassini et al., 2007; Aldrin et al., 2012). However, it has also been argued that such a prior could give an unrealistically high weight to high climate sensitivity (Annan and Hargreaves, 2011). Here we test our method with the commonly used uniform prior *U*[0;10], which has a 5 %–95 % range of 0.5–9.5 K. The resulting posterior 5 %–95 % range for climate sensitivity is 0.8–5.0 K when analysing the LGM PMIP2 models only and 0.6–5.4 K with the LGM PMIP2 and PMIP3 models together. These posteriors are wider than the ranges previously computed with a Cauchy prior, particularly for the case of combining PMIP2 and PMIP3 wherein the correlation is rather weak, in which case the prior has a higher influence.
These results are shown in Fig. 7.
Due to the questions which have arisen over the use of a uniform prior and the fact that it has an infinite integral, unless bounded arbitrarily as done here, we also perform a comparison with an alternative prior which features a decaying tail and a finite integral. For this purpose, a gamma prior is chosen with a shape parameter of 2 and a scale of 2, which corresponds to a similar 5 %–95 % prior range of 0.7–9.5 K. The posterior computed 5 %–95 % range is 1.0–4.5 K for LGM PMIP2 models and 0.9–4.8 K for the combination of PMIP2 and PMIP3, which is very close to the one computed with the Cauchy prior. Although the Bayesian paradigm will inevitably involve such subjective choices, the sensitivity of the results to a sensible choice of prior appears to be low as long as a reasonable correlation exists in the ensemble.

## 3.6 Model inadequacy

As previously explored and described by Williamson and Sansom (2019), we investigate the probability that all models deviate in a systematic way from reality to a certain extent, mainly because of computational limitations and their shared technical heritage. Statistically, this issue is best described by the terminology that while the models are considered “exchangeable” with each other, they are not exchangeable with reality. Williamson and Sansom (2019) provide a further discussion on this point. In our methodology, this can simply be accounted for by considering that the regression prediction of *S* for reality has a larger residual than that arising for the models themselves:

where the superscript t indicates here that we are referring to the truth (i.e. the real climate system) and *ϵ*^{∗} has the distribution $N(\mathrm{0},{\mathit{\sigma}}^{\ast \mathrm{2}})$ for some ${\mathit{\sigma}}^{\ast \mathrm{2}}>{\mathit{\sigma}}^{\mathrm{2}}$.
There can be various reasons why such an inadequacy, represented as *ϵ*^{∗} in Eq. (7), may be thought to exist. Models all share a common heritage and theoretical basis, which is certainly incomplete even if not substantially wrong, and computational constraints limit their performance. Particularly in the paleoclimate context, there may be biases in the experimental protocol and differences in the number of feedbacks included in the different model systems, e.g. interactive vegetation and prognostic dust. Such errors would lead to reality being some distance from the model regression line, even if the models were otherwise perfect. Such issues are relevant to both the LGM, wherein there are significant uncertainties relating to dust and vegetation effects, and the mPWP, wherein even the greenhouse gas (GHG) forcing is somewhat uncertain, as well as to older simulations that are designed as a general representation of interglacial warm periods rather than a specific quasi-equilibrium climate state.

However, while we may anticipate reality deviating further from the regression line, it is difficult to quantify such deviation. Here, we perform two sensitivity tests for which we define ${\mathit{\sigma}}^{\ast \mathrm{2}}=(\mathrm{2}\mathit{\sigma}{)}^{\mathrm{2}}$; that is to say the distribution for the residual term *ϵ*^{∗} is defined as *N*(0,(2*σ*)^{2}) for our predictions. We consider this to correspond to a rather large inadequacy term. To compare with our previous analysis, we investigate the effect of the model inadequacy using the dataset of PMIP2 and PMIP3 combined for the case of the LGM and the dataset of PlioMIP1 for the case of the mPWP. For the LGM, the 5 %–95 % posterior range computed after doubling *σ* is 0.5–5.8 K, while the 5 %–95 % posterior range for the mPWP is 0.5–5.4 K.
When we consider the “latest model version” approach outlined in Sect. 3.3 and take the same approach of doubling the estimated residual, the 5 %–95 % posterior ranges increase to 0.5–6.3 K for the LGM and a 5 %–95 % posterior range of 0.4–5.0 K for the mPWP. Thus these sensitivity tests typically involve a change of around half a degree to the upper bound obtained, while having much less influence on the lower bounds in these examples.

Past climates are relevant sources of information on the properties of the climate system, specifically the equilibrium climate sensitivity, due to the quasi-equilibrium changes in response to external forcing, which are of similar magnitude as the projected future climate changes. In this study, we have described a new statistical method based on Bayesian inference to approach the question of emergent constraints. We believe this method provides a reasonable representation within the Bayesian paradigm of the underlying structure of emergent constraint principles. This Bayesian method is designed to be as explicit and flexible as possible. Previous work using ordinary least squares has usually applied implicit assumptions. Because of these assumptions, predictions obtained via OLS tend to generate tight posterior ranges, particularly on the lower end and when the correlation is rather weak; this is something that may well be regarded as an artefact of using such a method.

By applying the method to the LGM tropical temperature model ensemble used in Schmidt et al. (2014), which included 14 models from the PMIP2 and PMIP3 generations, we estimate the climate sensitivity to be 2.6 K (0.7–4.8, 5th–95th percentiles). Similarly, applying the method to the mPWP tropical temperature dataset of Hargreaves and Annan (2016) gives a climate sensitivity of 2.4 K (0.5–5.0) but with the more uncertain ensemble of models which contributed to PlioMIP1.

With the new generation of climate models, LGM and mPWP analyses have been widened by the addition of several CMIP6 model outputs. By adding the PMIP4 LGM simulations, we computed a 5 %–95 % interval for climate sensitivity of 0.6–5.2 K. We performed the same analysis by combining PlioMIP1 and PlioMIP2 models and obtained a 5 %–95 % interval of 0.5–4.4 K. However, these results come with some caveats attached. In particular, combining the two model generations of the mPWP could lead to biased results, since the experimental protocol substantially changed in PlioMIP2. An alternative approach is to consider solely the latest version of each model. By doing this we reduce expected redundancy in the ensemble, and so improve our confidence in the result despite the smaller ensemble sizes. This leads to similarly constrained climate sensitivity of 2.7 (0.6–5.2, 5 %–95 %) for the LGM simulations and 2.3 (0.4–4.5, 5 %–95 %) for the mPWP simulations. Although most of the computed ranges are wider than the ranges obtained with OLS or Kalman filtering, the Bayesian framework avoids the underlying assumptions of both methods and, in particular, makes us regard the Kalman filtering approach in its current form as too restrictive for the question of emergent constraints.

Nevertheless, our results obtained by analysing the LGM or the mPWP in isolation are broadly consistent with results obtained by other statistical methods used in previous studies. The differences between the way the information is obtained from the paleo-record for the mPWP and the LGM and the different dominant climate features of the intervals suggest it may be reasonable to consider these estimates to be statistically independent, given climate sensitivity. It is then possible to combine them within the same Bayesian framework to compute a narrower range of climate sensitivity. By doing so, we evaluated the climate sensitivity to be 2.5 K (0.8–4.0, 5 %–95 %). However, this approach requires independence between the different combined emergent constraints.

It is, in principle, straightforward to include other independent emergent constraints into our Bayesian framework. As well as evidence from historical or present-day analyses, other past climates are starting to be explored by modellers and may be potential candidates for future analyses, such as the Eocene, the Miocene, and the last deglaciation. Over the next couple of years we expect new outputs for models from CMIP6 and new data analyses to become available, which will enable these preliminary analyses to be compared with results from expanded LGM and mPWP ensembles and improved data estimates.

In Sect. 3, we introduce the NUTS Markov chain Monte Carlo method, used with the Python package PyMC3 (Salvatier et al., 2016), to compute the posterior distributions of *α*, *β*, and *σ* and obtain the likelihood *P*(*T*_{tropical}|*S*). However, the likelihood model of the Bayesian linear regression is defined as ${T}_{i}\sim N(\mathit{\alpha}\times {S}_{i}+\mathit{\beta},{\mathit{\sigma}}^{\mathrm{2}})$, where (*T*_{i},*S*_{i}) is the *T*_{tropical} and *S* of the *i* models. Thus, it is possible to choose conjugate priors in this specific case of emergent constraints to avoid using the complex Hamiltonian-based NUTS method. We show here that both approaches lead to similar results.

For the case of the mPWP, we defined the priors $\mathit{\alpha}\sim N(\mathrm{1},{\mathrm{1}}^{\mathrm{2}})$, $\mathit{\beta}\sim N(\mathrm{0},{\mathrm{1}}^{\mathrm{2}})$, and *σ*∼ half-Cauchy (scale=5). Changing the prior *σ* to another family of distribution, such as $\mathit{\sigma}\sim \mathrm{inverse}-\mathrm{gamma}$, leads to a conjugate problem of the form normal – inverse gamma and allows us to generate a well-defined form for the posterior distributions of these parameters.

To illustrate this approach, we use the R implementation bayesLMConjugate (where LM stands for linear model) of the package spBayes (Finley et al., 2013, 2014). For clarity, a code to perform the MCMC approach is also provided in R based on the package RSTAN (Stan Development Team, 2019). Conjugate approaches are based on defining priors on the regression parameters conditioned on the (uncertain) error *σ*, scaled with a precision matrix **Λ**_{0}. As an example, we define the prior *σ*∼InvGamma (*a*=0.5, *b*=0.5), where *a* represents the shape parameter and *b* the rate parameter. The prior matrix of the regression parameters (both intercept and slope) is then $N({\mathit{\mu}}_{\mathrm{0}},{\mathit{\sigma}}^{\mathrm{2}}\times {\mathbf{\Lambda}}_{\mathrm{0}})$, where

and

The priors provided on the regression parameters in the MCMC method then need to be modified for comparison with the conjugate approach, i.e. conditioned on *σ* itself, which is straightforward to do thanks to the flexibility of PyMC3. Running the code is significantly faster than the use of an MCMC method, and both posterior outputs are compared in Fig. A1. The difference between the two methods is minimal and is within the range of precision of MCMC. If we take the full ensemble of PlioMIP1 and PlioMIP2 models simulating the mPWP, and using the posterior distributions of *α*, *β*, and *σ* from the conjugate prior method, we estimate a 95 % *S* of 2.3 K (0.5–4.4) compared to a similar value obtained via NUTS. An interesting aspect shown here is that the computed posterior range for *S* is similar to the one computed with the Cauchy and gamma prior, giving us confidence in the reduced influence of prior distribution in a well-correlated and large enough ensemble of data.

Although the choice of conjugate priors would simplify the computation, NUTS (or MCMC methods in general) have the advantage of allowing an explicit and flexible choice of priors for the users. Having such flexibility is a vital element for the analysis presented in this paper.
The example taken here to illustrate the Bayesian framework, i.e. the relationship of *T*_{tropical} and *S*, is a simple linear regression problem. However, we stipulate that such a framework could be used in more complex cases, such as higher-complexity emergent constraint relationships, in which the use of MCMC methods would become essential.

The Python codes used for the different statistical methods are available from the Bolin Centre Code Repository at https://git.bolin.su.se/bolin/renoult-2020 (last access: 2 July 2020; https://doi.org/10.5281/zenodo.3928315, Renoult and Annan, 2020). The data of the PMIP2 models can be obtained by asking the corresponding modelling groups. The data of the PMIP3 and CMIP6 models can be downloaded from the ESGF Portal at CEDA, located at https://esgf-index1.ceda.ac.uk/ (last access: 14 June 2020). The data of the PlioMIP1 models can be downloaded from Redmine at the School of Earth and Environment of the University of Leeds, located at https://www.see.leeds.ac.uk/redmine/public/ (last access: 6 June 2020). For a username and password, email Alan Haywood (a.m.haywood@leeds.ac.uk). The PRISM3 SST reconstruction can be downloaded from the PRISM/PlioMIP web page at https://geology.er.usgs.gov/egpsc/prism/1_background.html (last access: 6 August 2020; files PRISM3_SST_v1.1.nc and PRISM3_modern_SST.nc). The LGM SAT geological reconstruction can be downloaded from the Supplement of Annan and Hargreaves (2013), currently located at http://www.clim-past.net/9/367/2013/cp-9-367-2013-supplement.zip (last access: 14 June 2020). At the time of publication, the data of AWI-ESM-1-1-LR, INM-CM4-8, MIROC-ES2L, and MPI-ESM1.2-LR for the LGM and CESM2, EC-EARTH3.3, GISS-E2-1-G, IPSL-CM6A-LR, and NorESM1-F for the mPWP are available on ESGF at https://esgf-index1.ceda.ac.uk/search/cmip6-ceda/ (esgf-index1.ceda.ac.uk node, last access: 14 June 2020).

The BLR method was conceived by JDA and JCH. TM put the project together. The code for the Bayesian framework and for the OLS was written by MR. The code for the Kalman filter was written by JDA and translated to Python by MR. The code for the conjugate prior approach was written by MR. The statistical analyses were performed by MR. The climate sensitivities of the CMIP6 models were computed by CF. The paper was written by MR, JDA, JCH, NS, and TM. RO provided the LGM outputs of MIROC-ES2L. UM and MLK provided the LGM outputs of MPI-ESM1.2-LR. GL and XS provided the LGM outputs of AWI-ESM-1-1-LR. QZ and QL provided the mPWP outputs of EC-EARTH3.3.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Paleoclimate Modelling Intercomparison Project phase 4 (PMIP4) (CP/GMD inter-journal SI)”. It is not associated with a conference.

We thank the two anonymous reviewers for their insightful comments. We are very grateful for the extensive work of the scientists involved in the different PMIPs and their efforts to publish and share their data with us. We acknowledge the Bolin Centre for Climate Research at Stockholm University for giving us access to the code repository, which allows us to freely share our code. We acknowledge the STFC Centre for Environmental Data Analysis (CEDA) in collaboration with the InfraStructure for the European Network for Earth System Modelling (IS-ENES) and the Natural Environment Research Council via the National Centre for Atmospheric Science (NCAS) for making the CMIP5 and CMIP6 dataset available on the ESGF portal. We acknowledge Alan Haywood and Richard Rigby for giving us access to the PlioMIP1 dataset. We acknowledge the Statistical Research Group at the Department of Mathematics at Stockholm University and its director Jan-Olov Persson for his useful comments. Rumi Ohgaito acknowledges support from the Integrated Research Program for Advancing Climate Models (TOUGOU programme) from the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan. The simulations using MIROC models were conducted on the Earth Simulator of JAMSTEC. Gerrit Lohmann and Xiaoxu Shi acknowledge support from the PACMEDY project of the Belmont Forum and the PalMod project through BMBF. The simulations using AWI-ESM-1-1-LR were conducted on the German Climate Computing Centre (DKRZ). Qiong Zhang acknowledges support from Swedish Research Council VR grants 2013-06476 and 2017-04232. The MPI-M contribution was supported by the German Federal Ministry of Education and Research (BMBF) as a Research for Sustainability initiative (FONA) through the project PalMod (FKZ: 01LP1504C). The analysis and storage of data were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) at the National Centre at Linköping University (NSC).

This research has been supported by the European Research Council (highECS (grant no. 770765) and CONSTRAIN (grant no. 820829)).

The article processing charges for this open-access

publication were covered by Stockholm University.

This paper was edited by Julien Emile-Geay and reviewed by two anonymous referees.

Aldrin, M., Holden, M., Guttorp, P., Skeie, R. B., Myhre, G., and Berntsen, T. K.: Bayesian estimation of climate sensitivity based on a simple climate model fitted to observations of hemispheric temperatures and global ocean heat content, Environmetrics, 23, 253–271, 2012. a, b

Allen, M. and Ingram, W.: Constraints on future changes in climate and the hydrologic cycle, Nature, 419, 224–232, 2002. a

Andrews, T., Gregory, J. M., Webb, M. J., and Taylor, K. E.: Forcing, feedbacks and climate sensitivity in CMIP5 coupled atmosphere-ocean climate models, Geophys. Res. Lett., 39, L09712, https://doi.org/10.1029/2012GL051607, 2012. a, b, c, d, e, f

Annan, J. D. and Hargreaves, J. C.: On the generation and interpretation of probabilistic estimates of climate sensitivity, Climatic Change, 104, 423–436, 2011. a, b

Annan, J. D. and Hargreaves, J. C.: A new global reconstruction of temperature changes at the Last Glacial Maximum, Clim. Past, 9, 367–376, https://doi.org/10.5194/cp-9-367-2013, 2013. a, b, c, d, e

Annan, J. D. and Hargreaves, J. C.: On the meaning of independence in climate science, Earth Syst. Dynam., 8, 211–224, https://doi.org/10.5194/esd-8-211-2017, 2017. a, b

Annan, J. D. and Hargreaves, J. C.: Bayesian deconstruction of climate sensitivity estimates using simple models: implicit priors and the confusion of the inverse, Earth Syst. Dynam., 11, 347–356, https://doi.org/10.5194/esd-11-347-2020, 2020. a

Annan, J. D., Hargreaves, J. C., and Tachiiri, K.: On the observational assessment of climate model performance, Geophys. Res. Lett., 38, L24702, https://doi.org/10.1029/2011GL049812, 2011. a, b

Bodman, R. W. and Jones, R. N.: Bayesian estimation of climate sensitivity using observationally constrained simple climate models, Wiley Interdisciplinary Reviews: Climate Change, 7, 461–473, 2016. a

Boé, J., Hall, A., and Qu, X.: September sea-ice cover in the Arctic Ocean projected to vanish by 2100, Nat. Geosci., 2, 341–343, 2009. a, b

Bony, S., Colman, R., Kattsov, V. M., Allan, R. P., Bretherton, C. S., Dufresne, J.-L., Hall, A., Hallegatte, S., Holland, M. M., Ingram, W., Randall, D. A., Soden, B. J., Tselioudis, G., and Webb, M. J.: How well do we understand and evaluate climate change feedback processes?, J. Climate, 19, 3445–3482, 2006. a

Bowman, K. W., Cressie, N., Qu, X., and Hall, A.: A Hierarchical Statistical Framework for Emergent Constraints: Application to Snow-Albedo Feedback, Geophys. Res. Lett., 45, 13050–13059, https://doi.org/10.1029/2018GL080082, 2018. a, b, c

Braconnot, P., Otto-Bliesner, B., Harrison, S., Joussaume, S., Peterchmitt, J.-Y., Abe-Ouchi, A., Crucifix, M., Driesschaert, E., Fichefet, Th., Hewitt, C. D., Kageyama, M., Kitoh, A., Laîné, A., Loutre, M.-F., Marti, O., Merkel, U., Ramstein, G., Valdes, P., Weber, S. L., Yu, Y., and Zhao, Y.: Results of PMIP2 coupled simulations of the Mid-Holocene and Last Glacial Maximum – Part 1: experiments and large-scale features, Clim. Past, 3, 261–277, https://doi.org/10.5194/cp-3-261-2007, 2007. a

Brient, F., Schneider, T., Tan, Z., Bony, S., Qu, X., and Hall, A.: Shallowness of tropical low clouds as a predictor of climate models' response to warming, Clim. Dynam., 47, 433–449, https://doi.org/10.1007/s00382-015-2846-0, 2016. a

Caldwell, P. M., Bretherton, C. S., Zelinka, M. D., Klein, S. A., Santer, B. D., and Sanderson, B. M.: Statistical significance of climate sensitivity predictors obtained by data mining, Geophys. Res. Lett., 41, 1803–1808, 2014. a

Caldwell, P. M., Zelinka, M. D., and Klein, S. A.: Evaluating Emergent Constraints on Equilibrium Climate Sensitivity, J. Climate, 31, 3921–3942, https://doi.org/10.1175/JCLI-D-17-0631.1, 2018. a

Cox, P. M., Huntingford, C., and Williamson, M. S.: Emergent constraint on equilibrium climate sensitivity from global temperature variability, Nature Publishing Group, 553, 319–322, 2018. a, b, c

Dowsett, H. J., Robinson, M. M., and Foley, K. M.: Pliocene three-dimensional global ocean temperature reconstruction, Clim. Past, 5, 769–783, https://doi.org/10.5194/cp-5-769-2009, 2009. a, b, c, d

Fasullo, J. T. and Trenberth, K. E.: A less cloudy future: The role of subtropical subsidence in climate sensitivity, Science, 338, 792–794, 2012. a

Finley, A., Banerjee, S., and Gelfand, A.: spBayes for Large Univariate and Multivariate Point-Referenced Spatio-Temporal Data Models, J. Stat. Softw., 63, 1–28, https://doi.org/10.18637/jss.v063.i13, 2013. a, b

Finley, A. O., Banerjee, S., and Cook, B. D.: Bayesian hierarchical models for spatially misaligned data in R, Methods Ecol. Evol., 5, 514–523, https://doi.org/10.1111/2041-210X.12189, 2014. a, b

Gettelman, A., Hannay, C., Bacmeister, J. T., Neale, R. B., Pendergrass, A. G., Danabasoglu, G., Lamarque, J.-F., Fasullo, J. T., Bailey, D. A., Lawrence, D. M., and Mills, M. J.: High Climate Sensitivity in the Community Earth System Model Version 2 (CESM2), Geophys. Res. Lett., 46, 8329–8337, https://doi.org/10.1029/2019GL083978, 2019. a

Goosse, H., Crowley, T., Zorita, E., Ammann, C., Renssen, H., and Driesschaert, E.: Modelling the climate of the last millennium: What causes the differences between simulations?, Geophys. Res. Lett., 32, L06710, https://doi.org/10.1029/2005GL022368, 2005. a

Gregory, J. M., Ingram, W., Palmer, M., Jones, G., Stott, P., Thorpe, R., Lowe, J., Johns, T., and Williams, K.: A new method for diagnosing radiative forcing and climate sensitivity, Geophys. Res. Lett., 31, L03205, https://doi.org/10.1029/2003GL018747, 2004. a

Grise, K. M., Polvani, L. M., and Fasullo, J. T.: Re-examining the relationship between climate sensitivity and the Southern Hemisphere radiation budget in CMIP models, J. Climate, 28, 9298–9312, 2015. a

Guo, C., Bentsen, M., Bethke, I., Ilicak, M., Tjiputra, J., Toniazzo, T., Schwinger, J., and Otterå, O. H.: Description and evaluation of NorESM1-F: a fast version of the Norwegian Earth System Model (NorESM), Geosci. Model Dev., 12, 343–362, https://doi.org/10.5194/gmd-12-343-2019, 2019. a

Hajima, T., Watanabe, M., Yamamoto, A., Tatebe, H., Noguchi, M. A., Abe, M., Ohgaito, R., Ito, A., Yamazaki, D., Okajima, H., Ito, A., Takata, K., Ogochi, K., Watanabe, S., and Kawamiya, M.: Development of the MIROC-ES2L Earth system model and the evaluation of biogeochemical processes and feedbacks, Geosci. Model Dev., 13, 2197–2244, https://doi.org/10.5194/gmd-13-2197-2020, 2020. a

Hall, A. and Qu, X.: Using the current seasonal cycle to constrain snow albedo feedback in future climate change, Geophys. Res. Lett., 33, L03502, https://doi.org/10.1029/2005GL025127, 2006. a, b

Hargreaves, J. C. and Annan, J. D.: Could the Pliocene constrain the equilibrium climate sensitivity?, Clim. Past, 12, 1591–1599, https://doi.org/10.5194/cp-12-1591-2016, 2016. a, b, c, d, e, f, g, h, i, j, k, l, m, n

Hargreaves, J. C., Abe-Ouchi, A., and Annan, J. D.: Linking glacial and future climates through an ensemble of GCM simulations, Clim. Past, 3, 77–87, https://doi.org/10.5194/cp-3-77-2007, 2007. a

Hargreaves, J. C., Annan, J. D., Yoshimori, M., and Abe-Ouchi, A.: Can the Last Glacial Maximum constrain climate sensitivity?, Geophys. Res. Lett., 39, L24702, https://doi.org/10.1029/2012GL053872, 2012. a, b, c, d, e, f, g, h, i, j, k, l, m

Harrison, S. P., Bartlein, P. J., Brewer, S., Prentice, I. C., Boyd, M., Hessler, I., Holmgren, K., Izumi, K., and Willis, K.: Climate model benchmarking with glacial and mid-Holocene climates, Clim. Dynam., 43, 671–688, https://doi.org/10.1007/s00382-013-1922-6, 2014. a

Haywood, A. M., Dowsett, H. J., Robinson, M. M., Stoll, D. K., Dolan, A. M., Lunt, D. J., Otto-Bliesner, B., and Chandler, M. A.: Pliocene Model Intercomparison Project (PlioMIP): experimental design and boundary conditions (Experiment 2), Geosci. Model Dev., 4, 571–577, https://doi.org/10.5194/gmd-4-571-2011, 2011. a

Haywood, A. M., Hill, D. J., Dolan, A. M., Otto-Bliesner, B. L., Bragg, F., Chan, W.-L., Chandler, M. A., Contoux, C., Dowsett, H. J., Jost, A., Kamae, Y., Lohmann, G., Lunt, D. J., Abe-Ouchi, A., Pickering, S. J., Ramstein, G., Rosenbloom, N. A., Salzmann, U., Sohl, L., Stepanek, C., Ueda, H., Yan, Q., and Zhang, Z.: Large-scale features of Pliocene climate: results from the Pliocene Model Intercomparison Project, Clim. Past, 9, 191–209, https://doi.org/10.5194/cp-9-191-2013, 2013. a, b, c, d, e, f, g, h

Haywood, A. M., Dowsett, H. J., Dolan, A. M., Rowley, D., Abe-Ouchi, A., Otto-Bliesner, B., Chandler, M. A., Hunter, S. J., Lunt, D. J., Pound, M., and Salzmann, U.: The Pliocene Model Intercomparison Project (PlioMIP) Phase 2: scientific objectives and experimental design, Clim. Past, 12, 663–675, https://doi.org/10.5194/cp-12-663-2016, 2016. a, b

Hoekstra, R., Morey, R. D., Rouder, J. N., and Wagenmakers, E.-J.: Robust misinterpretation of confidence intervals, Psychon. B. Rev., 21, 1157–1164, 2014. a

Hoffman, M. D. and Gelman, A.: The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo, J. Mach. Learn. Res., 15, 1593–1623, 2014. a

Hopcroft, P. O. and Valdes, P. J.: How well do simulated last glacial maximum tropical temperatures constrain equilibrium climate sensitivity?, Geophys. Res. Lett., 42, 5533–5539, 2015. a, b

K-1 Model Developers: K-1 Coupled Model (MIROC) Description (K-1 Technical Report 1), Center for Climate System Research, University of Tokyo, Tokyo, Japan, 2004. a

Kageyama, M., Albani, S., Braconnot, P., Harrison, S. P., Hopcroft, P. O., Ivanovic, R. F., Lambert, F., Marti, O., Peltier, W. R., Peterschmitt, J.-Y., Roche, D. M., Tarasov, L., Zhang, X., Brady, E. C., Haywood, A. M., LeGrande, A. N., Lunt, D. J., Mahowald, N. M., Mikolajewicz, U., Nisancioglu, K. H., Otto-Bliesner, B. L., Renssen, H., Tomas, R. A., Zhang, Q., Abe-Ouchi, A., Bartlein, P. J., Cao, J., Li, Q., Lohmann, G., Ohgaito, R., Shi, X., Volodin, E., Yoshida, K., Zhang, X., and Zheng, W.: The PMIP4 contribution to CMIP6 – Part 4: Scientific objectives and experimental design of the PMIP4-CMIP6 Last Glacial Maximum experiments and PMIP4 sensitivity experiments, Geosci. Model Dev., 10, 4035–4055, https://doi.org/10.5194/gmd-10-4035-2017, 2017. a, b

Kalman, R. E.: A new approach to linear filtering and prediction problems, J. Basic Eng.-T. ASME, 82D, 33–45, 1960. a

Knutti, R., Masson, D., and Gettelman, A.: Climate model genealogy: Generation CMIP5 and how we got there, Geophys. Res. Lett., 40, 1194–1199, https://doi.org/10.1002/grl.50256, 2013. a

Lorius, C., Jouzel, J., Raynaud, D., Hansen, J., and Le Treut, H.: The ice-core record: climate sensitivity and future greenhouse warming, Nature, 347, 139–145, 1990. a

Masson, D. and Knutti, R.: Climate model genealogy, Geophys. Res. Lett., 38, L08703, https://doi.org/10.1029/2011GL046864, 2011. a

Mauritsen, T., Bader, J., Becker, T., Behrens, J., Bittner, M., Brokopf, R.,
Brovkin, V., Claussen, M., Crueger, T., Esch, M., Fast, I. Fiedler, S. Fläschner, D., Gayler, V., Giorgetta, M., Goll, D. S., Haak, H., Hagemann, S., Hedemann, C., Hohenegger, C., Ilyina, T., Jahns, T., Jimenéz-de-la-Cuesta, D., Jungclaus, J., Kleinen, T., Kloster, S., Kracher, D., Kinne, S., Kleberg, D., Lasslop, G., Kornblueh, L., Marotzke, J., Matei, D., Meraner, K., Mikolajewicz, U., Modali, K., Möbis, B., Müller, W. A., Nabel, J. E. M. S., Nam, C. C. W., Notz, D., Nyawira, S.-S., Paulsen, H., Peters, K., Pincus, R., Pohlmann, H., Pongratz, J., Popp, M., Raddatz, T. J., Rast, Sand Redler, R., Reick, C. H., Rohrschneider, T., Schemann, V., Schmidt, H., Schnur, R., Schulzweida, U., Six, K. D., Stein, L., Stemmler, I., Stevens, B., von Storch, J.-S., Tian, F., Voigt, A., Vrese, P., Wieners, K.-H., Wilkenskjeld, S., Winkler, A., and Roeckner, E.: Developments in the
MPI-M Earth System Model version 1.2 (MPI-ESM1. 2) and Its Response to
Increasing CO_{2}, J. Adv. Model. Earth Syst., 11, 998–1038,
2019. a

Notz, D.: How well must climate models agree with observations?, Philos. T. Roy. Soc. A, 373, 20140164, https://doi.org/10.1098/rsbl.2014.0164, 2015. a, b

Ohgaito, R., Yamamoto, A., Hajima, T., O'ishi, R., Abe, M., Tatebe, H., Abe-Ouchi, A., and Kawamiya, M.: PMIP4 experiments using MIROC-ES2L Earth System Model, Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2020-64, in review, 2020. a

Randall, D. A., Wood, R. A., Bony, S., Colman, R., Fichefet, T., Fyfe, J., Kattsov, V., Pitman, A., Shukla, J., Srinivasan, J., Stouffer, R. J., Sumi, A., and Taylor, K. E.: Climate models and their evaluation, in: Climate change 2007: The physical science basis. Contribution of Working Group I to the Fourth Assessment Report of the IPCC (FAR), edited by: Solomon, S., Qin, D., Manning, M., Chen, Z., Marquis, M., Averyt, K. B., Tignor, M., and Miller, H. L., Cambridge University Press, Cambridge, UK and New York, NY, USA, 589–662, 2007. a, b, c, d, e, f, g

Renoult, M. and Annan, J. D.: A Bayesian framework for emergent constraints: case studies of climate sensitivity with PMIP: Python and R statistical scripts of the study (Version 2.2.0), Zenodo, https://doi.org/10.5281/zenodo.3928315, 2020. a

Salvatier, J., Wiecki, T. V., and Fonnesbeck, C.: Probabilistic programming in Python using PyMC3, PeerJ Computer Science, 2, e55, https://doi.org/10.7717/peerj-cs.55, 2016. a, b

Schmidt, G. A., Annan, J. D., Bartlein, P. J., Cook, B. I., Guilyardi, E., Hargreaves, J. C., Harrison, S. P., Kageyama, M., LeGrande, A. N., Konecky, B., Lovejoy, S., Mann, M. E., Masson-Delmotte, V., Risi, C., Thompson, D., Timmermann, A., Tremblay, L.-B., and Yiou, P.: Using palaeo-climate comparisons to constrain future projections in CMIP5, Clim. Past, 10, 221–250, https://doi.org/10.5194/cp-10-221-2014, 2014. a, b, c, d, e, f, g, h, i, j

Sherwood, S., Webb, M., Annan, J., Armour, K., Forster, P., Hargreaves, J., Hegerl, G., Klein, S., Marvel, K., Rohling, E., Watanabe, M., Andrews, T., Braconnot, P., Bretherton, C. S., Foster, G. L., Hausfather, Z., von der Heydt, A. S., Knutti, R., Mauritsen, T., Norris, J. R., Proistosescu, C., Rugenstein, M., Schmidt, G. A., Tokarska, K. B., and Zelinka, M. D.: An assessment of Earth's climate sensitivity using multiple lines of evidence, Rev. Geophys., accepted, https://doi.org/10.1029/2019RG000678, 2020. a

Stan Development Team: RStan: the R interface to Stan, r package version 2.19.1, available at: http://mc-stan.org/ (last access: 2 July 2020), 2019. a

Sueyoshi, T., Ohgaito, R., Yamamoto, A., Chikamoto, M. O., Hajima, T., Okajima, H., Yoshimori, M., Abe, M., O'ishi, R., Saito, F., Watanabe, S., Kawamiya, M., and Abe-Ouchi, A.: Set-up of the PMIP3 paleoclimate experiments conducted using an Earth system model, MIROC-ESM, Geosci. Model Dev., 6, 819–836, https://doi.org/10.5194/gmd-6-819-2013, 2013. a

Tierney, J. E., Haywood, A. M., Feng, R., Bhattacharya, T., and Otto-Bliesner, B. L.: Pliocene Warmth Consistent With Greenhouse Gas Forcing, Geophys. Res. Lett., 46, 9136–9144, https://doi.org/10.1029/2019GL083802, 2019. a

Tingley, M. P., Craigmile, P. F., Haran, M., Li, B., Mannshardt, E., and Rajaratnam, B.: Piecing together the past: statistical insights into paleoclimatic reconstructions, Quaternary Sci. Rev., 35, 1–22, 2012. a

Tomassini, L., Reichert, P., Knutti, R., Stocker, T. F., and Borsuk, M. E.: Robust Bayesian Uncertainty Analysis of Climate System Properties Using Markov Chain Monte Carlo Methods, J. Climate, 20, 1239–1254, 2007. a

Vial, J., Dufresne, J.-L., and Bony, S.: On the interpretation of inter-model spread in CMIP5 climate sensitivity estimates, Clim. Dynam., 41, 3339–3362, 2013. a

Williamson, D. B. and Sansom, P. G.: How are emergent constraints quantifying uncertainty and what do they leave behind?, B. Am. Meteorol. Soc., 100, 2571–2588, https://doi.org/10.1175/BAMS-D-19-0131.1, 2019. a, b, c, d, e

Wyser, K., van Noije, T., Yang, S., von Hardenberg, J., O'Donnell, D., and Döscher, R.: On the increased climate sensitivity in the EC-Earth model from CMIP5 to CMIP6, Geosci. Model Dev., 13, 3465–3474, https://doi.org/10.5194/gmd-13-3465-2020, 2020. a

Zheng, W., Zhang, Z., Chen, L., and Yu, Y.: The mid-Pliocene climate simulated by FGOALS-g2, Geosci. Model Dev., 6, 1127–1135, https://doi.org/10.5194/gmd-6-1127-2013, 2013. a

_{2}. In this study, we develop a new and promising statistical method and obtain similar results as previously observed, wherein the sensitivity does not seem to exceed extreme values.