Interactive comment on “ Refining error estimates for a millennial temperature reconstruction ”

I am sure that the IPCC authors would not have used numbers if they did not intend them to be interpreted numerically. It is clear that there is a strong element of expert assessment in the IPCC conclusions, but they have expressed their assessment in numerical values and it is the accuracy of those values which is the considered in this paper. This study also depends on a certain element of expert assessment: the comparison against independent proxies is included to support the conclusions though it does not contribute directly to the numerical values given.


Introduction
There have been many recent reconstructions of the temperature of the last millennium (see Jones et al., 2009;Mann et al., 2008;Juckes et al., 2007;Jansen et al., 2007, for reviews).All show broadly similar shape, with an initial warm period, slow general decline after that and finally a steep increase in temperatures.There is, however, less clarity about the degree of uncertainty in the reconstructions and a need for "more realistic assessments of reconstruction uncertainty" (Jones et al., 2009).The clearest statements of uncertainty come from IPCC reports.In the 2001 assessment Folland et al. (2001, hereafter IPCC2001) conclude that "The 1990s are likely to have been the warmest decade of the millennium in the Northern Hemisphere, and 1998 is likely to have been the warmest year", while (Jansen et al., 2007, hereafter IPCC2007) in the 2007 assessment conclude that it is "likely that" 1950 to 2000 "was the warmest period in the last 1.3 kyr".In both the IPCC reports cited above something is considered "likely" if it is estimated to have a 2 in 3 chance of occurring or having occurred.with indirect evidence.This evidence, the proxy data, reflects environmental variations: but the link between the environment and the proxy data is generally not precisely quantified.For this reason it is usual to use statistical relations between climate and proxy data.
A number of authors (B ürger et al., 2006;Lee et al., 2008;von Storch et al., 2009) have used "virtual reality" tests, in which reconstruction methods are assessed by calculating pseudo-reconstructions based on pseudo-proxy records generated from past climates simulated by models.Christiansen et al. (2009) show that the performance of any given reconstruction method can vary substantially for different realisations of the climate, and call into question the reliability of methodological studies which have used only a limited number of model climate realisations and suggest that this explains the sometimes contradictory results from other studies.
In the present paper error estimates are obtained by analysing in detail the effect of omitting data with the Jackknife technique (e.g.Shao and Tu, 1995), and by comparing two reconstructions made with fully independent proxy records over the period 1750 to 1850.If all local/regional records used in a given reconstruction showed synchronous variations with the same amplitude, the uncertainty estimate given by the Jackknife method would be zero.The uncertainty thus reflects inhomogeneities in the proxy data and the extent to which the true NH-mean temperature variations are captured by the retained data.This method is applied to the reconstruction from Juckes et al. (2007, hereafter JAB2007), which used 13 proxies and to a reconstruction using 3 new proxy data series (described in the next section) together 12 of those from the collection used by JAB2007.The geographical distribution of the proxies is shown in Fig. 1.
The outline of this paper is as follows: Section 2 discusses the data used, Sect. 3 investigates the temporal homogeneity of the data.Section 4 presents the reconstruction of the Northern Hemisphere temperature and Sect. 5 evaluates its uncertainty.Section 6 looks specifically at uncertainties of general statements, such as those quoted above from IPCC Assessment Reports.Here, both the 13 proxy collection of JAB2007, and an extended series using the following additional 3 proxies will be considered: -A Mongolian tree ring composite record (D'Arrigo et al., 2001), a Chinese temperature reconstruction (Ge et al., 2003), and a Donard lake sediment record (Moore et al., 2001).
In the expanded proxy collection, the Arabian sea planktonic foraminifera series previously used is omitted: this series was used in Moberg et al. (2005) and JAB2007 in order to enhance spatial coverage, but is at best an indirect indicator of temperature.This leaves an increase from 13 to 15 series, a modest 15% increase.
In a recent study, Mann et al. (2008) assembled a collection of 95 proxies extending back to 1000 AD, of which 79 are in the Northern Hemisphere: these numbers are reduced to 58 and 45 respectively when tree ring chronologies constructed from fewer than 8 samples are omitted.Out of the 45 Northern Hemisphere unscreened series 19 are tree ring chronologies, 16 in North American and 3 in Eurasia.There are 4 ice core series (all from Greenland); 5 from cave stalactites (2 in China, 2 in Costa Rica and one in Scotland); 2 regional temperature reconstructions (both from China); and 15 sediment series, of which one is marine.Mann et al. (2008) then screen for correlations with observed temperatures, which reduces the number of Northern Hemisphere proxies to 19, comparable with the 15 used here.
From this brief survey it can be seen that the amount of data extending back to Rather than following Mann et al. (2008) in using correlations with temperature to select proxies, the selection here is based on a priori reasoning.As noted in JAB2007, there are two advantages of this approach: the possibility of including poor proxies because there is insufficient data to screen accurately is avoided and not using temperature measurements in the data selection simplifies the uncertainty estimation which is a key part of this work.The first of these gains is of course offset by the possibility of including poor proxies because of insufficient a priori knowledge.
From the 45 Northern Hemisphere proxy records dating back to 1000 AD which are included in the Mann et al. (2008) collection prior to temperature screening, low latitude sediment series which are expected to reflect precipitation variation are omitted.The 2 Costa Rican speleothem series were supplied to Mann et al. (personal communication, 2008): in the absence of a peer reviewed description of the data origins they will not be used here.Speleothems from the Dongge cave (China) and Scotland used by Mann et al. (2008) are believed to be more precipitation sensitive.No additional Greenland ice core or N. American tree ring data has been used here, as these two regions were already well represented in JAB2007.Ljungqvist (2009, hereafter LJ2009) reviews 71 proxy climate records.His collection includes a long tree record from Solongotyn Davaa in Mongolia, but here we follow Mann et al. (2008) in using the composite series of D' Arrigo et al. (2001).This composite has a correlation of 0.79, over the period AD 1000-1980, with the Solongotyn Davaa series.LJ2009 includes a Taimyr record from Naurzbaev et al. (2002), but not that from Naurzbaev and Vaganov (1999) which was used by Esper et al. (2002), Juckes et al. (2007) and is used here.The correlation between these two Taimyr series is 0.96.The following series used here are not included in the LJ2009 review: Donard lake, Quelcaya (Peru) 18 δO record, China composite of Yang et al. (2002)

Instrumental data
The HADCRUT3 (Brohan et al., 2006) Northern Hemisphere mean data is used to calibrate the new reconstructions, replacing the HadCRUT2v data from Jones and Moberg (2003) used by JAB2007.The new data extends further back (to 1850 instead of 1856) and has improved (that is, more reliable) uncertainty estimates.

Temporal homogeneity of the proxy data
Proxy based reconstructions rely on the assumption that the proxy response to temperature variations in past centuries was close to that observed in the last 150 years.
Recent detailed modelling of the response of trees to climate variations (Anchukaitis et al., 2006) reinforces the expectation that the response of tree-rings is related to known physical and biological properties of the trees.The modelling approach implies that, in the absence of other factors, the response to temperature variations in the past would match that in the calibration period, but it cannot test for the absence of other factors.
A complementary approach is to look at the level of consistency between different types of proxies.The mean anomaly correlation between tree and non-tree proxies is defined as follows: where N p is the number of pairs i ,j such that p i is a tree-ring proxy and p j is a nontreering proxy and p i p j c n has been evaluated over 50 year periods at 10 years intervals, 1000-1049, 1010-1059, etc., and is plotted in Fig. 2.There is substantial variability, and periods for which the correlation is negative.However, the spread appears consistent with random variation about the mean and there is no clear trend: the largest value of c n occurs around 1450, and the values in the 11th century are comparable with those in the calibration period.
Figure 2 also shows the correlations among the tree ring chronologies (green) and among the non-tree ring chronologies (purple).The correlations between the non-tree ring proxies and the tree ring chronologies are higher than the correlations among the non-tree ring proxies.This approach will not detect systematic changes that affect all the proxies equally, but it does provide an indication of the level of consistency.
This brief analysis of correlations does not produce any evidence of systematic changes in proxy behaviour.
4 Northern Hemisphere temperature

The influence of new data
As in JAB2007, the reconstruction is generated by centring the proxies on the calibration period and normalising them to unit variance in that period, forming the composite with all proxies equally weighted, and then scaling by a factor γ vm such that the variance of the scaled composite matches the variance of the instrumental temperature record in the calibration period: where the over-line is a mean over the calibration period, and the prime indicates a departure from the mean over that period, T i is the instrumental temperature and C is the proxy composite.
Figure 3 shows the JAB2007 reconstruction and the new reconstruction using the collection of 15 proxy data series (hereafter referred to as "R15") and the updated and extended instrumental record described in Sect. 2. Also shown is a reconstruction using the JAB2007 proxies and the new instrumental temperature record: it can be seen that this change has little impact.The differences between the R15 reconstruction and that of JAB2007 are well within any reasonable uncertainty estimate, but the difference is highly coherent over the first 7 centuries of the reconstruction.This underlines the fact that uncertainties in proxy based reconstructions are likely to be correlated in time.

Comparison with independent data
Some studies have followed Mann et al. (2008) in using part of the instrumental record for calibration and withholding part of the record in order to validate the reconstruction or to select from a family of reconstructions.However, shortening the length of the calibration period reduces the accuracy of the calibration.Furthermore, this study uses a number of proxies with low temporal resolution, such that it would be impossible to treat the validation period as independent from the calibration period.For these reasons, the full instrumental record has been used for calibration and independent proxy records are used for validation.It was noted in the introduction that the number of proxy records extending back to 1000 AD is limited, but there are many more records of shorter extend.Here, two recent studies extending back to 1751 are used.Because of the greater data volume available, these reconstructions can be expected to have greater accuracy.studies, the proxy data is entirely independent of the data used here.In order to obtain an NH extra-tropical temperature estimate, the Wilson et al. (2007) index is scaled to match the variance of the NH mean temperature north of 20 • N, 1856 to 2000.This is then combined with the Wilson et al. (2006) series, weighting each series with the area north and south of 20 • N respectively to produce an estimate of Northern Hemisphere mean temperature (hereafter referred to as "Wilson*", Fig. 4a).
Figure 4b shows the period 1751 to 1900, and it can be seen that there is a very close correspondence between the R15 reconstruction and the combined Wilson* data in the period 1751 to 1850.There is agreement in the timing and magnitude of a cooling trend from 1790 to 1810 AD, though the Wilson* series has a somewhat cooler period from 1810 to 1820 AD.
Both the above time series are independent of the R15 reconstruction in terms of the proxy data input, though in both cases the scaling is determined by instrumental temperature.
The root mean square difference between the combined Wilson series and the R15 reconstruction is 0.136 K (0.139 K for the JAB2007 reconstruction).This cannot, however, be used as an error estimate because the Wilson et al. (2006Wilson et al. ( , 2007) ) reconstructions have used observed temperature in the data selection and calibration.
The correlation between the series is, on the other hand, independent of the the calibration of the series.This composite Wilson series is found to have a correlation R = 0.695 with the R15 series.Taking a conservative estimate of 15 degrees of freedom, this correlation is statistically significant at th 99.80% level, so that we can reject with near certainty the null hypothesis that fluctuations in the time series are purely random.Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version Interactive Discussion

Uncertainty estimates
The comparison with a wholly independent set of data in the previous section gives a strong validation of the robustness of the climate signal in the proxy records, but does not provide an uncertainty estimate.This section will investigate and quantify different factors contributing to uncertainty.

The delete-d Jackknife estimator
The Jackknife method (e.g.Shao and Tu, 1995) will be used to generate ensembles of NH temperature reconstructions.The form of the Jackknife most widely used in atmospheric sciences is based on an ensemble obtained from deleting in turn each element of the input data.Here the "delete-d Jackknife" variants of the method, which are based on ensembles obtained by deleting all possible combinations of d elements of the input data, will be used.The "delete-d " version provides a means of estimating confidence limits as well as variances with reduced reliance on assumptions of normality.The Jackknife method usually assumes independent of errors between input elements -an assumption which cannot be sustained in the case of proxy data.The impact of error correlations in the input data will be considered below, and a simple correction will be applied to the Jackknife uncertainty estimate.
The spread of reconstructions within ensembles which are generated by omitting up to 6 elements of the dataset of 13 and 15 member proxy collections are shown in Table 1.For each delete-d ensemble, the standard deviation about the ensemble mean, σ d , is evaluated for each time step.The Jackknife error estimate for each time step (neglecting, for now, the impact of error correlation, which will be discussed below) is then given by: where N c = 13,15 is the number of proxies in the collection (Shao and Tu, 1995).Figure 5 shows the 5% and 95% levels of the Jackknife distribution for the d = 4,5,6 ensembles, before and after scaling by the correction factor in Eq. ( 2).The scaling effectively collapses the results from different values of d onto a single line.There is slightly greater range in the 16th century and significantly lower range in the 18th century.The narrow range of reconstructions in the 18th century is likely to be partially due to the presence of proxies with long correlation time scales.Apart from this, the range does not vary much with time.The following sub-sections will use these ranges as the basis of uncertainty estimates.
The 5 and 95% percentiles of the Jackknife distribution will be used to estimate 5 and 95% confidence limits.This reduces the reliance on the assumption of Gaussian statistics.The Jackknife distribution does, however, have to be scaled in order to provide an estimate of the error distribution and that scaling is based on an analysis of the standard deviation.That is, instead of assuming Gaussian statistics, it is assumed that the shape of the Jackknife distribution scales linearly to the uncertainty distribution.

Estimating proxy error correlation
As noted above, the Jackknife method assumes that the errors in the proxies are uncorrelated.The impact of error correlations on the confidence limits can be estimated from the simpler problem of evaluating the mean of N c errors, each of standard deviation σ.The squared variance of the mean, σ 2 m , is given by mean of the N 2 c elements of the covariance matrix.If the correlations are zero, this gives the familiar result σ 2 m = N −1 c σ 2 .If, however, the N c (N c −1) off-diagonal elements of the covariance matrix have a mean value cσ 2 , we obtain: Printer-friendly Version

Interactive Discussion
That is, as is well known, the uncertainty in the mean increases with increasing error correlation.However, it is shown in Appendix A that the presence of a positive correlation actually decreases the Jackknife error estimate given by Eq. ( 2): This is understandable when one considers that a value of c = 1 will result in zero spread in the Jackknife ensemble and hence σ d :jack = 0.
To adjust for this a corrected Jackknife estimate is introduced: If the correlation were known, Eq. ( 3) would provide an improved uncertainty estimate.
In practice, the correlation c is not known and we must rely on estimates, three of which are given in Table 2: (1) c: The anomaly correlation, with the anomaly calculated relative to the time mean; (2) c * : The anomaly correlation, with the anomaly calculated relative to a reduced composite, omitting the proxy pair being correlated; (3) c r : The correlation of residuals in the calibration period, after removing a fit to temperature in the calibration period.
In order to obtain a conservative estimate of the accuracy of the reconstructions, the largest of the 3 estimates is used in Eq. (3) to evaluate the correction factor in the last column.Although the correlations are not large, the presence of the factor N c −1 in the numerator leads to a significant correction.
Figure 6 shows the impact of scaling the spread of the Jackknife distribution using the factor 1.66 from

Structural uncertainty
The scaling of the reconstruction depends on the calibration method used.The disagreement between different methods is an indication of uncertainty which is not captured in the Jackknife estimation.
Figure 7 shows the spread of the composite against the Northern Hemisphere temperature record, together with 3 regression lines, representing different calibration methods, namely: least squares regression (red), inverse regression (blue) and variance matching (green).The range of regression slopes suggests that upper and lower reasonable bounds for the slope are 0.285 and 0.135 respectively.For the 13 proxy collection of JAB2007 the range is 0.262 to 0.122.
Suppose that the regression coefficient used to scale the composite series has a probability density function, f s (γ): that is, the probability of γ falling in a narrow interval around γ 1 is given by: for small ∆.
The Jackknife ensemble, f j (T ), for a given γ: Equations (4, 5) can then be combined to give a probability distribution of the reconstructed temperatures taking uncertainty in γ into account, making use of the fact that the distribution of temperatures for any value of γ can be obtained by scaling the temperature axis by γ/γ vm .Convolving the two probability distributions then gives: As with the error correlations discussed in the previous subsection, we do not have precise information about the probability distribution of γ.Two model distributions will Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc

Printer-friendly Version
Interactive Discussion be considered: a piecewise uniform (almost uniform, but taking into account the asymmetry between values above and below γ vm ), and a piecewise Gaussian.These distributions are described in more detail in Appendix C. Figure 6 shows the impact of the structural uncertainty adjustments, assuming a piecewise uniform distribution, on the 50-year mean.Results obtained with the piecewise Gaussian distribution are very similar and will be discussed further in the next section.The uncertainty in the calibration coefficient has a larger effect when the anomaly is large, but has a smaller effect in the 11th century, when the composite anomaly is small.Note, however, that the structural uncertainty multiplies the sampling uncertainty.Thus, even though the expected anomaly is near zero in the 11th century, the range of possible anomalies implied by the Jackknife ensemble is significant, and this range is magnified by the structural uncertainty.

Summary
Figures 8, 9 and 10 show the estimated uncertainty ranges for annual, decadal and 50-year averaged data.The 95% percentile of the uncertainty estimate of the annual reconstruction does not exceed the 1998 temperature at any point, but it would be incorrect to conclude that there is 95% certainty that the 1998 temperature was not exceeded at any point.To illustrate this, consider an idealised case in which every year of a century has a 4% chance of having exceeded some threshold T max and that all years are independent: the probability that at least one year of the century exceeded T max is easily calculated as 1−(0.96) 100 = 0.98; in this idealised case there is only a 2% chance that T max is not exceeded.The calculation for the temperature reconstruction is, however, more complicated because the probability of exceeding T max varies from year to year, and the years are not entirely independent from each other.Section 6 below addresses this problem.Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc

Printer-friendly Version
Interactive Discussion respectively).The uncertainties here are not smoothed versions of the annual uncertainties: they are evaluated from Jackknife ensembles of smoothed reconstructions with the corrections for error correlations and structural uncertainty i (described in the previous section) applied.Now there is a significant increase in uncertainty associated with the larger temperature anomalies of the 15th to 17th centuries.

Uncertainties in specific statements
The figures discussed in the previous section show that, for instance, the maximum of the estimated 95th percentile of the estimated uncertainty distribution.These figures cannot, however, directly address statements such as those from the IPCC quoted in the introduction which refer to the likelihood of a threshold being exceeded in a specified time period.The Jackknife technique does, however, provide a means to evaluate likelihoods for such statements.
Here three statements will be tested, all applied to the Northern Hemisphere annual mean temperature: The likelihood of these statements being true is estimated by taking the Jackknife ensembles, scaled to account for structural uncertainty and proxy error correlations, and evaluating the proportion of ensemble members which satisfy each statement.
Results are presented in Table 3, together with the standard deviations of the scaled ensembles.
First, it can be seen that the difference between the d = 5 and d = 6 results is small for all variations in the table.
Secondly, the R15 reconstruction has standard deviations which are around 10% smaller than those of the JAB2007 union reconstruction, as might be expected from the slight increase in data.The estimated standard deviations decrease slightly with the averaging period, but the decrease is modest: the standard deviation for a 50 year average is no more than 20% smaller than that for the annual data.
Thirdly, although the standard deviations are smaller for the R15 reconstructions are reduced relative to the JAB2007 reconstruction the estimated certainties for statements 1 to 3 are reduced: this is a consequence of the upward shift in the central estimate of past temperatures.
A fourth salient feature of the results is the higher uncertainty with respect to S3, and the significantly lower uncertainty with respect to S2.This is a reflection of the fact that the recent anomaly in the 50 year mean is smaller than the recent decadal anomaly, but the uncertainties are only marginally smaller.

Conclusions
A new reconstruction of the millennial temperature has been generated from 15 proxies.The reconstruction uses the same simple composite plus scaling method as JAB2007, and differs through the addition of four and omission of one proxy data series.The temperature evolution is, not surprisingly, similar to that of the JAB2007 reconstruction, out shifted upwards consistently by about 0.2 K. Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version

Interactive Discussion
The reconstruction uncertainty for the two series is, for 50 year means, estimated to be marginally under 0.2 K, so that this shift is within the expected uncertainty range.
The uncertainty analysis has been extended to make specific uncertainty estimates for several key questions.However, the value put on the uncertainty depends on a number of subjective assumptions.For this reason, this study presents a range of uncertainty estimates characterised by more or less optimistic assumptions about the nature of noise in the proxy records.
This study supports the conclusions of IPCC2001 about the exceptional nature of the 1998 temperature maximum but falls short of confirming the IPCC2007 conclusions about the exceptional nature of the temperature of the last 50 years of the 20th century.This latter failure is a consequence of the finding that the reduction in uncertainty which results from the 50 year averaging is very slight, whereas the reduction in signal is significant.
The strongest result relates to the temperature of the last decade, which exceeds any decade prior to 1850 with 95% certainty.The increased certainty compared to the 66% certainty expressed by IPCC2001 is primarily a consequence of the continuing high temperatures which have made the last decade 0.24 K warmer than the last decade of the 20th century, a warming greater than one standard deviation of the reconstruction uncertainty.
The expected variance of e α is given by If the mean is now taken over all α, the sums of σ 2 i and C i j span all possible combinations of i ,j without favouring any values: it follows that σ 2 i can be replaced with the mean value σ 2 and C i j with its mean value cσ 2 .After a little algebra it follows that the spread of the Jackknife ensemble is given by:

Conclusions References
Tables Figures

Back Close
Full The to distributions below are used to model the uncertainty in the scaling of the composite reconstructions, based on a central estimate γ vm and upper (γ max ) and lower (γ min ) extremes which are asymmetrical about the central estimate.
Firstly, a piecewise uniform distribution, with median γ vm : Secondly, a piecewise Gaussian distribution, again with median γ vm : In both cases, the discontinuity at γ vm in the model distribution is used as a convenient way of obtaining the desired asymmetry between positive and negative anomalies.
the time mean over the time interval n.The prime denotes an anomaly relative to the mean of the 50 year period over which the correlation is being calculated Wilson et al. (2006) produce a 1751-1981 reconstruction of tropical SST, using coral data.This tropical reconstruction can be complemented by the extra-tropical reconstruction ofWilson et al. (2007), which extends back to 1750.TheWilson et al. (2007) work uses tree-rings selected to be free of the so called "divergence problem".In both Introduction

-
S1: The 1998 temperature exceeded the maximum 11th century annual temperature, -S2: The mean from 1990 to 1999 exceeded the maximum 11th century decadal mean temperature, -S3: The mean from 1950 to 1999 exceeded the maximum 11th century 50-year mean temperature, -S4: The mean from 1999 to 2008 exceeded the maximum 11th century decadal mean temperature

Fig. 1 .Fig. 2 .Fig. 3 .
Fig. 1.Map showing the location of climate proxies: green and red show, respectively, treering and non-tree ring proxies used in both JAB2007 and this study, dark blue is the Globigerina bulloides series used in JAB2007, light blue is the Mongolian composite ring of D'Arrigo et al. (2001), and two further Chinese series are in orange.Tree-ring data marked in green.
JAB2007 used 13 proxy climate records covering the period 1000 to 1980 AD, 12 in the Northern Hemisphere and one (the Quelcaya glacier record) at 14 • S, taken to be indicative of tropical temperatures in both hemispheres.

Table 1
gives the time-mean values of σ d and σ d :jack .It is clear that σ d :jack is only weakly dependent on the parameter d .However, for reasons mentioned above, the Jackknife estimate cannot be taken as a reliable guide to actual uncertainty without further work.

Table 2 .
Figures Jackknife ensemble.Let S d :α , α = 1,N α be all the subsets of indices with d elements, where N α = N d is the number of ways to select d elements from N, and S d :α be the complement of S d :α (i.e., all subsets of indices with d elements omitted).Now consider the anomalies, e α , of the Jackknife ensemble elements relative to the mean of the ensemble, which are given by:

Table 2 .
Different estimates of the proxy error correlation: see text for details.

Table 3 .
Estimated likelihood that statements 1 to 4 are untrue (columns 5 to 8).Column 2 indicates the proxy collection used, column 3 gives the number of elements omitted in the Jackknife ensemble, column 4 gives the form used to model the structural uncertainty.The last three columns give the estimated standard error of the annual, decadal and 50 year averaged reconstructions.