Technical Note: Correcting for Signal Attenuation from Noisy Proxy Data in Climate Reconstructions

Regression-based climate reconstructions scale one or more noisy proxy records against a (generally) short instrumental data series. Based on that relationship, the indirect information is then used to estimate that particular measure of climate back in time. A well-calibrated proxy record(s), if stationary in its relationship to the target, should faithfully preserve the mean amplitude of the climatic variable. However, it is well established in the statistical literature that traditional regression parameter estimation can lead to substantial amplitude attenuation if the predictors carry significant amounts of noise. This issue is known as " Measurement Error " (Fuller, 1987; Carroll et al., 2006). Climate proxies derived from tree-rings, ice cores, lake sediments, etc., are inherently noisy and thus all regression-based reconstructions could suffer from this problem. Some recent applications attempt to ward off amplitude attenuation, but implementations are often complex (Lee et al., 2008) or require additional information, e.g. from climate models (Hegerl et al., 2006, 2007). Here we explain the cause of the problem and propose an easy, generally applicable, data-driven strategy to effectively correct for attenuation (Fuller, 1987; Carroll et al., 2006), even at annual resolution. The impact is illustrated in the context of a Northern Hemisphere mean temperature reconstruction. An inescapable trade-off for achieving an un-biased reconstruction is an increase in variance, but for many climate applications the change in mean is a core interest. 1 The problem of noisy predictors Random noise in any linear system will affect the estimation process of regression coefficients that tie explanatory vari-able(s) X to the response Y. Uncertainty in estimation of Y can be quantified through the variance of the error from an ordinary least squares (OLS) fit, which by definition, in this case, provides unbiased parameter estimates (thus it is known as " BLUE " : best linear unbiased estimator). Errors in the predictor(s) X, however, cause the regression slope to get attenuated towards zero and the resulting signal in the prediction or reconstruction period will invariably be biased (Fuller, 1987). Figure 1a illustrates this effect for a simple 1:1-linear process where the response Y is only observed (available for calibration) over the interval 0.9 to 1 while X is available over the full range of 0 to 1. Increasing the noise contained in X attenuates the OLS-derived slope parameter away from the true linear relationship. Why does noise in the predictors cause attenuation of the true signal? Consider a …


The problem of noisy predictors
Random noise in any linear system will affect the estimation process of regression coefficients that tie explanatory variable(s) X to the response Y .Uncertainty in estimation of Y can be quantified through the variance of the error from an ordinary least squares (OLS) fit, which by definition, in this case, provides unbiased parameter estimates (thus it is known as "BLUE": best linear unbiased estimator).Errors in the predictor(s) X, however, cause the regression slope to get attenuated towards zero and the resulting signal in the prediction or reconstruction period will invariably be biased (Fuller, 1987).Figure 1a illustrates this effect for a simple 1:1-linear process where the response Y is only observed (available for calibration) over the interval 0.9 to 1 while X is available over the full range of 0 to 1. Increasing the noise contained in X attenuates the OLS-derived slope parameter away from the true linear relationship.
Why does noise in the predictors cause attenuation of the true signal?Consider a simple linear regression model Y =β 0 +β 1 X+ε for which we have instrumental observations Y and the noisy proxy record W = X + U , where X is the desired climate signal and U is the contaminating noise.An OLS regression of instrumental data Y is therefore not directly on X but actually on W , and thus the result is not a consistent estimate of the desired regression coefficient β 1 (Fuller, 1987;Carroll et al., 2006).Rather, the regression slope is, in fact, σ 2 X /(σ 2 X + σ 2 U ) • β 1 , where σ 2 X and σ 2 U denote the variance of X and U , respectively.Therefore, the larger the noise U , the stronger the attenuation of the regression slope will be.
Fig. 1.Influence of increasing noise in predictors of simple linear models where the calibration is restricted to the interval 0.9-1.0 in the response variable Y and prediction extends to 0. (a) Traditional OLS regression exhibits rapid increase in attenuation of the true (orange) linear relationship as the signal-to-noise ratio (S:N) becomes dominated by the noise; (b) The OLS (green) regression result S:N=1:1 from (a) is corrected using two possible TLS-answers (blue) based on the assumptions that either all noise is in X (η=0) or where the noise in X and Y are thought to be equal (η=1).Depending on estimation of parameters η, solutions for TLS will most likely be somewhere in between.ACOLS (red) solutions are close to the true regression coefficients (orange), yet the variance is somewhat increased.(c) Box-plots representing the range of solutions over 1000 replicates for applications shown in (b).

Method
Ideally σ 2 U would be obtained through independent replicates of the noisy predictors.Where this is not possible (such as in most paleoclimate applications), it has to be estimated from the data.In a simple linear regression model, if the variance σ 2 U of the noise in the predictor is known, then an Attenuation Corrected Ordinary Least Squares (ACOLS) estimator of the slope β 1 is where σ 2 W is the sample variance of W (Fuller, 1987).In the absence of replicated W to estimate σ 2 U we first obtain the residual variance σ 2 U from the OLS regression of W on Y , i.e.W =β 0 * +β 1 * Y +ε * .If the noise in W is much larger than the noise in Y , then σ 2 U would be a good estimator of σ 2 U .However, if this is not the case, then a correction must be applied.Here we propose to correct 1 * ,OLS (for justification, see Supplementary Material: http://www.clim-past.net/6/273/2010/cp-6-273-2010-supplement.pdf) and search the k > 0 by a 5fold cross-validation (Stone, 1974).Specifically, we divide the whole calibration period into 5 sections and then assess the ACOLS regression estimated from any combination of four sections on the fifth for a given k.The k that minimizes the prediction bias is the retained estimate of k.To ensure finite moments and superior small sample properties of β1,ACOLS , we follow Sect.2.5 in Fuller (1987) and replace σ 2 U in Eq. ( 1) by 1 − α n−1 σ 2 U , where α >0.Although Fuller (1987) provides an optimal choice of α in order to minimize the mean squared error of β1,ACOLS , we are rather interested in minimizing the bias in the reconstruction and to this end α needs to be close to zero.Hence we simply set α=0.01, but note that our results are insensitive to values between 0.001 and 0.1.

Getting a precise σ 2
U is the critical step in this procedure.Here we proposed one way to estimate σ 2 U when there are no replicates, but other approaches could possibly be developed.The choice might depend on the problem, the data and the noise structure at hand.To illustrate the robustness of our approach under the given example conditions, we show in the Supplementary Material (http://www.clim-past.net/6/273/2010/cp-6-273-2010-supplement.pdf) that results are qualitatively very similar even after we artificially added "white"-noise to the individual proxy series.Comparable but deteriorative results were also observed under "red"-noise conditions, although the reconstruction uncertainty becomes larger, particularly if multiple predictors are used.Here we naively applied the same correction method to the "red" noise, but more rigorous methodological developments on correlated measurement errors are called for.Further, a more systematic assessment is needed to compare the effectiveness of different estimation methods and their robustness under various realistic conditions (e.g., noise magnitude or characteristic).
Based on our ACOLS regression and the estimated σ 2 U , we can correct the attenuation and obtain an unbiased estimate of the true slope β 1 for X (Fuller, 1987;Carroll et al., 2006).This straightforward approach can also be implemented in a multiple linear regression framework where a vector of slopes is attenuated, and hence needs to be corrected.
Consider now a multiple linear regression model Y =β 0 +β T X + ε with observed p-dimensional vector W = X + U representing the signal X contaminated by noise U .Let XX and U U denote the variance-covariance matrices for X and U , respectively.Note that U U is not restricted to be a diagonal matrix.In fact, it will have non-zero off diagonal entries when the p variables in U are correlated.If U U is known, then the ACOLS estimator of β is where ˆ W W is the sample variance-covariance matrix of W .To estimate U U , we first obtain the residual variancecovariance matrix ˜ U U from separate OLS regressions of W i on Y , i.e.W i =β 0i * +β 1i * Y +ε i * , for each i=1,. . .,p.Then we make the correction βT * ,OLS where β * ,OLS = ( β11 * ,..., β1p * ) T .The rest of the procedure is analogous to above.
In the statistical literature it is often believed that noise in predictors is of no concern if the sole goal is prediction (Fuller, 1987;Carroll et al., 2006).However, this only applies in situations where the range of both W and Y is well represented in the calibration period.If this is not the case, then the noise in W does introduce bias in the prediction (Fig. 1a).Intuitively, as W becomes dominated by noise, then the OLS-based regression line will get attenuated away from the true relationship between X and Y and approach a horizontal line where it simply estimates the mean of Y in the calibration period.
Applying attenuation correction in the ordinary least squares (ACOLS) solution effectively eliminates the bias seen in OLS-based reconstructions (Fig. 1b, c).Orthogonal regression methods such as total-least-squares (TLS) can also recover the correct regression coefficients (Hegerl et al., 2006).However, in contrast to ACOLS, the implementation of TLS additionally requires a careful estimation of errors in the Y variable.Carroll and Ruppert (1996) have illustrated that such TLS-implementations can be possibly dangerous because: (a) the ratio η of the variance of ε to the variance of U can be sensitive to small changes in its two estimated components; (b) an additional variance component in the numerator of η is often omitted that represent the "equation error" (Fuller, 1987), arising from the fact that even in the absence of measurement error data typically do not fall onto a straight line, and consequently the corresponding TLS solution will potentially overcorrect the attenuation.In our simulation example, the range of TLS answers is indicated in Fig. 1b, c by its two practical end-members, η=0 (all proxy noise) and η=1 (equal proxy and instrumental noise), although η=∞ (all instrumental noise, i.e.OLS) is also possible.These plots show that an imprecise estimate of η will lead to qualitatively different results while the requirement of estimating σ 2 U only makes ACOLS results stable.However we here by no means suggest that ACOLS is the only valid method for the correction.As long as the ratio η can be precisely specified, the TLS will also correctly remove the attenuation effect.
Before presenting an application, it is important to point out that in this Technical Note we can only deal with a small subset of regression methods.A rich literature exists about the various approaches to linear models (e.g., Fritts et al., 1990;Isobe et al., 1990;Osborne, 1991), where methods are sometimes used under different (even conflicting) names.Subsequently we will use a paleoclimate example to illustrate how to implement our method.The goal is simply to introduce this method of correction for signal attenuation to the existing catalogue of regression options for paleoclimate problems.We will not provide a full intercomparison here (cf., Rutherford et al., 2005;Bürger et al., 2006;Hegerl et al., 2006;Juckes et al., 2007;Lee et al., 2008;Mann et al., 2008;Christiansen et al., 2009).
3 Applications in a paleoclimate context "Measurement error" correction has already been employed in various disciplines (Carroll et al., 2006), particularly in Astronomy (Isobe et al., 1990;Akritas and Bershady, 1996;Kelly, 2007).Although analyses based on noisy predictors are common in climate research, the need for correction against attenuation has only recently been explicitly recognized (Allen and Stott, 2003;Hegerl et al., 2007;Mann et al., 2007aMann et al., , 2008;;Riedwyl et al., 2009).In fact, the potential problem of the magnitude in paleoclimate reconstructionswhere reconstructions are based on indirect, and thus inherently noisy, proxy records -has only been fully recognized as climate model output has been used in synthetic exercises to test reconstruction methods (Zorita et al., 2003;von Storch et al., 2004;Hegerl et al., 2006;Wahl et al., 2006;Ammann and Wahl, 2007;Küttel et al., 2007;Mann et al., 2007a, b;Rutherford et al., 2008;Smerdon and Kaplan, 2007;Lee et al., 2008;Moberg et al., 2008;Smerdon et al., 2008;Christiansen et al., 2009;Riedwyl et al., 2009).An often-discussed example concerns the true amplitude of Northern Hemisphere (NH) mean temperature over past centuries and millennia (Mann et al., 1998;Briffa et al., 2001;Jones et al., 2001;Esper et al., 2002Esper et al., , 2005;;von Storch et al., 2004;Moberg et al., 2005;Hegerl et al., 2007;Osborn and Briffa, 2006;Juckes et al., 2007).Currently neither the proxies -because of concerns of potentially unreliable low-frequency information -nor the models -because of uncertainty in the magnitude of the forcings as well as the overall climate sensitivity -can resolve this issue.Lately, different strategies that reduce such amplitude loss have been explored (Juckes et al., 2007;Lee et al., 2008;Christiansen et al., 2009).They include one or a combination of approaches: the selection of a longer, more representative calibration period (Ammann and Wahl, 2007), partial (Mann et al., 2007a) 2007) -here subsampled from output of a coupled GCM (Ammann et al., 2007) where the true climate is known: (a) attenuated results from uncorrected OLS regression, and (b) the reconstructions from ACOLS.The vertical line at "1900" separates the calibration period from the reconstruction.Gray shaded area represents the 95% confidence interval (following Li et al., 2007).
For illustration of ACOLS in a climate reconstruction application, we show in Fig. 2 a simple example.Using output from a coupled Atmosphere-Ocean General Circulation Model simulation (Ammann et al., 2007), we subsampled the annual temperature field at the grid-locations of real world proxies used in Hegerl et al. (2007).For the purpose of demonstrating the effect of the choice of a regression method, this example should suffice, particularly given the fact that the correlations between model gridpoint information and model hemispheric temperature turn out to be very similar to the real world data (Hegerl et al., 2007), and thus the important signal-to-noise level represented in the model-based example is broadly comparable.[Note: Adding noise to the samples would make the geophysical reconstruction problem certainly more realistic, yet the noise does not appreciably change our conclusion on the difference between OLS and ACOLS-based reconstruction results.For illustration of the effect of adding "white"-and "red"noise, we provide corresponding Figs.S2 and S3 in the Supplementary Material: http://www.clim-past.net/6/273/2010/cp-6-273-2010-supplement.pdf.In principle, other regression methods should be tested for such examples as well.For this Technical Note, however, we restrict the discussion on the simple case of twelve isolated locations that sample from a highly varying field of interest (the NH average temperature).Further, more in-depth and comprehensive investigations need to be carried out.] The annual data of twelve distinct grid point samples were calibrated over the period 1900-1999 against the true model NH temperature in both simple (composite plus scale, CPS) and multiple regression approaches.OLS-based reconstructions (Fig. 2a) indicate significant attenuation of the true amplitude of climate over the prediction period.In contrast, ACOLS-derived reconstructions (Fig. 2b) are essentially unbiased in the evolving temperature amplitude and the true NH temperatures remain inside the 95%-confidence interval of the reconstruction.This interval, in fact, is achieved for full annual resolution of the data throughout the reconstruction, and results were simply smoothed for visualization (see Supplementary Material: http://www.clim-past.net/6/273/2010/cp-6-273-2010-supplement.pdf).Recent TLS and other methods' results shown in Lee et al. (2008) were potentially benefiting from the decadal smoothing prior to reconstruction (or by including a low-frequency step (Mann et al., 2007a), a process that significantly reduces the noise compared to the signal).Similarly, combining multiple proxies into a composite was found to perform better, particularly if red-noise was present (see Supplementary Material: http://www.clim-past.net/6/273/2010/cp-6-273-2010-supplement.pdf), likely because compositing averaged the unrelated, persistent noise across records.The same dampening of noise effects could be expected from a Canonical Regression approach used in Luterbacher et al. (2004).But other than TLS, only the KF-approach in Lee et al. (2008) does explicitly take noise in the predictors into consideration, and thus is expected to avoid attenuation from noise, even at annual resolution.Its implementation, however, is much more involved and in the multiple regression framework also computationally much more expensive.

Discussion and conclusions
One trade-off that has to be accepted in regressionbased reconstructions is that the correction for bias in the signal amplitude comes at the cost of increased variance arising from the additional scale (correction) factor in βACOLS (Carroll et al., 2006, p. 60) (see Supplementary Material: http://www.clim-past.net/6/273/2010/cp-6-273-2010-supplement.pdf).This variance increase in our example in Figs. 2 and S1 is mostly concentrated at the interannual scale, and thus decadal smoothing of the reconstructions results essentially compensates for this.[Note: In case of additional noise with memory (red noise), the variance increase will appear also over longer time scales.]ACOLS could provide a simple and more stable way of warding off attenuation in regression-based reconstructions than previously proposed methods.Such improvements are not only possible for the large scale climate application demonstrated here, but are equally expected in any other regression-based inferences where the predictors are carrying substantial noise.In paleoclimatology, for example, this includes local or regional reconstructions based on records such as tree-rings, pollen, corals, or isotopic composition.Because an a priori assumption of "no change" in mean between the calibration and prediction/reconstruction period is not commonly possible (particularly not under the current climate where a trend dominates the instrumental record), attenuation correction is not only helpful, it is, in fact, necessary if a faithful representation of the true amplitude of the climate signal is to be recovered.Even if the noise in predic-tors approaches zero and no correction would be necessary, ACOLS will simply tend towards the OLS solution and still remains unbiased.Further examples should be tested to verify our result under different sampling, and particularly real world noise conditions.
In the climate arena, re-evaluation of existing reconstructions using ACOLS will likely confirm recent supposition of enhanced amplitudes (Huang et al., 2000;Esper et al., 2002;Moberg et al., 2005;Hegerl et al., 2007;Mann et al., 2008) over the recent past compared to earlier estimates.The overall structure of climate and its interpretation, however, should not be affected because in most cases we are simply dealing with a change in the slope, and thus a scale factor, of a linear relationship(s).Further research is now necessary to evaluate how the full, annual resolution of ACOLS can be used for spatial field reconstructions where enhanced variance, after having achieved a good and unbiased estimate of the mean, has to be controlled at the regional scale to preserve the dynamical structure of interannual climate variability (Luterbacher et al., 2004;Rutherford et al., 2005;Mann et al., 2007a).Climate model output will again play a key role.An ideal platform for such tests is actually provided through various evaluation exercises of the PAGES/CLIVAR Paleoclimate Reconstruction (PR) Challenge (see: http://www.pages-igbp.org/science/prchallenge/).

Fig. 2 .
Fig. 2. CPS (blue) and multiple (red) regression reconstructions of NH mean temperature (10-year Gaussian smoothed results for visualization -high-resolution reconstructions are available in Supplementary Information, see http://www.clim-past.net/6/273/2010/cp-6-273-2010-supplement.pdf) based on a network of twelve grid-points -comparable to Hegerl et al. (2007) -here subsampled from output of a coupled GCM(Ammann et al., 2007)  where the true climate is known: (a) attenuated results from uncorrected OLS regression, and (b) the reconstructions from ACOLS.The vertical line at "1900" separates the calibration period from the reconstruction.Gray shaded area represents the 95% confidence interval (followingLi et al., 2007).