A likelihood perspective on tree-ring standardization: eliminating modern sample bias

It has recently been suggested that non-random sampling and differences in mortality between trees of different growth rates is responsible for a widespread, systematic bias in dendrochronological reconstructions of tree growth known as modern sample bias. This poses a serious challenge for climate reconstruction and the detection of 5 long-term changes in growth. Explicit use of growth models based on regional curve standardization allow us to investigate the effects on growth due to age (the regional curve), year (the standardized chronology or forcing) and a new effect, the productivity of each tree. Including a term for the productivity of each tree accounts for the underlying cause of modern sample bias, allowing for more reliable reconstruction of 10 low-frequency variability in tree growth. This class of models describes a new standardization technique, fixed effects standardization, that contains both classical regional curve standardization and flat detrending. Signal-free standardization accounts for unbalanced experimental design and fits the same growth model as classical least-squares or maximum likelihood regression 15 techniques. As a result, we can use powerful and transparent tools such as R and Akaike’s Information Criteria to assess the quality of tree ring standardization, allowing for objective decisions between competing techniques. Analyzing 1200 randomly selected published chronologies, we find that regional curve standardization is improved by adding an effect for individual tree productivity 20 in 99 % of cases, reflecting widespread differing-contemporaneous-growth rate bias. Furthermore, modern sample bias produced a significant negative bias in estimated tree growth by time in 70.5 % of chronologies and a significant positive bias in 29.5 % of chronologies. This effect is largely concentrated in the last 300 yr of growth data, posing serious questions about the homogeneity of modern and ancient chronologies 25 using traditional standardization techniques.


Introduction
Much of the work in dendrochronology, and dendroclimatology in particular, relies on accurate, unbiased reconstructions of tree growth long into the past.As a result, a great deal of effort has been put into trying to isolate important trends and identify potential biases.However, one major bias called "modern sample bias", first identified by Melvin (2004), is still largely neglected in applied studies, despite its potential impact on all regional curve standardization chronologies (Brienen et al., 2012a).
Dendrochronologists observed that the older a tree was, the slower it tended to grow, even after controlling for age-and time-driven effects.The result is an artificial downward signal in the regional curve (as the older ages are only represented by the slower growing trees) and a similar artificial positive signal in the final chronology (as earlier years are only represented by the slow growing trees), an effect termed modern sample bias.When this biased chronology is used in climate reconstruction it then implies a relatively unsuitable historic climate.Obviously, the detection of long term trends in tree growth, as might be caused by a changing climate or carbon fertilization, is also seriously compromised (Brienen et al., 2012b).More generally, modern sample bias can be viewed as a form of "differing-contemporaneous-growth-rate bias", where changes in the magnitude of growth of the tree ring series included in the chronology over time (or age, in the case of the regional curve) skew the final curve, especially near the ends of the chronology where series are rapidly added and removed (Briffa and Melvin, 2011).
Several attempts have been made to address this issue but none have proven fully satisfactory.Melvin (2004) (see also Briffa and Melvin, 2011;Cooper et al., 2012, andMelvin et al., 2012a) attempt to solve the problem by splitting the regional curve into several smaller curves by growth rate as first introduced by Esper et al. (2002) but this approach offers only limited correction as the number of sub-RCS curves is necessarily smaller than the number of levels of growth rate observed by the trees and the reduction in sample size reduces the reliability of each sub-curve.Voelker (2011) took 4501 a different approach, first standardizing the chronologies with respect to age and annual effects, then estimating the linear relationship between tree growth rate and tree age for each species studied from binned growth and age data, which was then used to scale the species (or in some case genus) level chronologies.Interestingly, while the majority of the species/genera analyzed showed a negative relationship between tree age and growth rate, a positive relationship was observed in a few cases, contrary to the predictions of (Briffa and Melvin, 2011).Whether this effect has an ecological basis or simply represents a random quirk of the chronologies examined was not discussed.While revealing, this technique relies on large pre-existing chronologies for the species of interest and assumes a simple common linear relationship between age and tree-specific productivity.
To develop an alternative approach, we make explicit the growth models already used in regional curve standardization.From there, we examine the ecological effects driving the persistent differences in growth rates between sampled trees of various ages and then use regression to obtain an unbiased estimate of the inherent productivity of each tree, the typical growth of the trees at a given age (the regional curve) and the forcing at each year (the standardized chronology).The relationship between this new technique, dubbed "fixed effects standardization", regional curve standardization (Briffa et al., 1992), flat detrending (Cook and Kairiukstis, 1990) and the more recent signal-free standardization (Melvin and Briffa, 2008;Briffa and Melvin, 2011) is explored along the way.
We conclude with a brief sample of existing dendrochronological records, demonstrating fixed effects standardization, selecting the most appropriate standardization model for each data set and exploring the effects of accounting for tree-level productivity across the globe.

Growth models
Regional curve standardization makes two central assumptions about the typical growth of trees used in dendrochronological analysis.First, that trees of the same species within the same region follow a certain inherent pattern of growth as they age, given by the regional curve.Second, the growth of each tree in a given year is the product of the expected growth at that age and the common forcing of that year (Melvin et al., 2012a).This common forcing affects all trees equally in proportion to their expected growth, an assumption clearly visible in the division of the raw tree ring series by the regional curve to obtain the standardized chronology, which is by design free of age-driven effects.By doing so, regional curve standardization presumes a model, and hence can be treated as a model-fitting tool (a goal expressed, but not acheived, in Bontemps and Esper, 2011).
Tree ring data can naturally be classified among two dimensions, the year in which the ring was formed (t) and the age of the tree when the ring was formed (a).Each chronology can be stored naturally in a growth matrix G, a form referred to as a "treering array", in contrast to the traditional form where each column holds the ring widths observed for a single series and the row name denotes the year (a "tree ring table").Figure 1 shows a conceptual diagram and further explanation of these two alternate organizations of tree-ring data.
If we consider the time effect (standardized chronology) and age trend (regional curve) as the effect vectors T and A respectively, we can write the implicit growth model of regional curve standardization as follows: Each element of the growth matrix G is a scalar, the product of the corresponding elements of T and A. Looking at it from the perspective of the entire vectors, we can construct the tree ring-array as the outer product of the effect vectors.

G = T ⊗ A
(2) 4503 Obviously, real trees do not follow this growth model exactly, and should be thought of as being drawn from a population described by a probability density function.The product of the time and age effects is the expected value of growth for each ring, the observed data includes an error term to account for stochastic noise and problems with model fit.The simplest way to do so is to assume an additive, normal error term such that: However, it is commonly observed that real tree ring width data is strongly heteroscedastic, with variability increasing as the observed growth values grow larger (Biondi, 1993;Meko et al., 2001).Proportional log-normal variability has long been observed in measurements of plant growth (Evans, 1972;Pokharel and Dech, 2012) and reported in dendrochronology (Van Den Brakel and Visser, 1996;Drobyshev and Niklasson, 2010).Furthermore, tree-ring width data is naturally bounded by 0, a fact for which a normal probability density function fails to account.Rather than a normal probability density function, we suggest that most tree ring data may be drawn from a log-normal probability density function instead.A multiplicative log-normal error term accounts for the observed heteroscedasticity while remaining tractable and is consistent with the dendrochronological practice of log-transforming series before analysis.
To address differing-contemporaneous-growth-rate bias, and thus modern sample bias, we need to examine its cause.Many researchers have observed that during the process of chronology construction, there are persistent differences in growth rates between trees (Brienen et al., 2006;Zuidema et al., 2010).Furthermore, Melvin (2004) observed that the ratio between these series and the common signal (either the regional curve or standardized chronology) is approximately constant.This ratio was termed "error" (later discussed as the ratio between multiple regional curves in Melvin et al., 2012a).In the growth model framework, it can be considered the effect of each individual tree (I i ), and thought of as the inherent productivity of tree i. Extending our model to incorporate this concept: From this perspective, it is clear that differing-contemporaneous-growth-rate bias (and thus modern sample bias) is an omitted variable bias!Note that in this case G becomes a three-dimensional array, recording the tree i, age a and year t for which each data point was recorded.I is a vector much like T and A but, unlike the others, lacks a natural ordering.
The model in Eq. ( 1) fails because the assumption that the error in each series is unbiased by which tree the series belongs to is not supported by the data.As series are added to and removed from the chronology, the mean value of I in the chronology changes, biasing the observed chronology downwards if the trees were less productive than average and upwards if the trees were more productive than average.The magnitude of this effect only changes when the series present in the chronology changes, the bias observed takes the form of a step function by time and age and is typically stronger near the ends of the chronology (Bowman et al., 2013).Seen from this perspective, I is a traditional nuisance parameter (as was A originally), whose effects must be accounted for to obtain a reliable estimate of T. If I is not identically and independentally distributed across time and age (as in modern sample bias scenarios), the resulting chronology is skewed.
The choice of mean (arithmetic/geometric) corresponds to the choice of probability density function (normal/lognormal).In many cases, dendrochronologists choose to subtract, rather than divide, the estimated effect vector from the growth data.This corresponds to an additive growth model, similar to: For ring width data, this form of model is almost certainly inappropriate.Growth data is by definition positive and typically multiplicative; the use of additive models can result in negative predicted growth.Note, of course, that the logaritmic transformation of a multiplicative growth model with log-normal noise is equivalent to an additive model with normal noise.
These models can still be fit and examined using the techniques throughout the paper.In particular, they can be compared to their multiplicative counterparts using model selection tools (see Section 5 for an exploration of this on real data).
When using real data, it is almost impossible to find a tree-ring measurment for each unique combination of age and year and as a result the tree-ring array is almost never full.The observed array of growth values G can be discussed using a weighting array W, which describes the number of data points present at each location.By definition, values can only be filled along a single t-a diagonal for each position of i, as the tree ages by one year for each calendar year.Similarly, if only complete (pith to bark) treering series are used, the upper-right t-a triangle is always empty; if a tree is sufficiently old at the appropriate point in the past, it extends the chronology backwards in time a corresponding number of years, leaving the new upper triangle empty again.As discussed in Sect.4, this unbalanced design complicates analysis and accounts for the improvement of signal-free standardization over traditional regional curve standardization.
These models have a final interesting property: for each solution to a tree ring array, there exist an infinite number of equivalent solutions.The system is singular, any scalar multiple of one of the effects vectors (I, T or A) does not change the predicted growth G as long as it is counteracted by the reciprocal scaling of a different vector, producing equally likely models with different coefficients.To address this, we fix the geometric mean of the elements of I and the elements of T to 1 by convention and scale A to compensate.This allows, as is traditional, A to map directly to the typical ring-width increment of a tree at a given age (as in the regional curve) and I and T to represent an index of deviation from this expected growth.The chronologies in Melvin et al. (2012a) are scaled in slightly different fashion to reach the same end: the simple comparison of estimates produced by different techniques.

Ecology of modern sample bias
It has long been known that the life expectancies of trees tend to be negatively correlated with their growth rate, both between and within species of trees (Huntington, 1913;Schulman, 1954;Black et al., 2008;Johnson and Abrams, 2009).Brienen et al. (2012b) explain how this causes modern sample bias via productivity-survivorship bias (in that paper termed "slow-grower survivorship bias"): if slow-growing trees are more likely to survive, they will be over-represented in the oldest sections of the standardized chronology and regional curve, producing a positive skew.When trying to reproduce this effect, it's helpful to think of the problem in terms of survivorship curves.First, assume each tree follows a particular survivorship curve conditional on its productivity.Formally, the survivorship curve is the probability of surviving at a given age and productivity level and can be written as P(S|a, I).
Using Bayes' theorem, we can look at the distribution of surviving trees of a given age for different values of I by considering the joint probability of this survivorship curve and the initial distribution of productivity at birth, P(I).P(I|a, S) = P(S|a, I) • P(I) The this expression describes the typical productivity of trees of that age.The fluctuation in the expected value of this expression results in modern sample bias.When I is not accounted for, ages that are less likely to contain productive trees will be underestimated by the regional curve (Fig. 2).The effect this has on the final standardized 4507 chronology depends on the arrangement of the data, but when all series are complete and alive at the time of sampling (modern) the effect will be precisely opposite that on the regional curve.
For our purposes, survivorship curves are discussed as fast-biased, in which case fast-growing trees are more likely to survive, slow-biased (the reverse) or unbiased.
It is of course possible for a survivorship curve to be fast-biased for some ages and slow-biased at others depending on the ecological effects at play.We suggest that there are four broad drivers of productivity-survivorship bias: competitive dominance, the patchy resource effect, ecophysiological limitation and biased disturbances.Many of these effects are both poorly quantified and complex, limiting our ability to predict the direction or magnitude of productivity-survivorship bias in general.
Competitive dominance is the basis of natural self-thinning.It has long been observed that trees suffer increased mortality rates when growing more slowly than their neighbours due to competitive exclusion (Peet and Christensen, 1987).Tree diameter is both allometrically linked to tree height, an important determinant of light competition, and suffers directly when resources are limited by competition.As a result, competitive dominance will increase the typical productivity of the population as competition occurs and slow-growing trees are removed from the population.As Brienen et al. (2012b) suggested, this effect is likely strongest in young trees, especially when the species is shade-intolerant.
The patchy resource effect acts on a larger scale, that of environmental variability in fertility.Resources in a forest ecosystem (water, microclimate and nutrients in this context) are inherently patchy, leading to low and high fertility sites and microsites.Increased site fertility is directly linked to increased competition and accelerated stand closure; as soil nutrients (or microclimate) improve, more resources can be allocated to above-ground biomass and light competition per year (Vanninen and Makela, 1999).Research into self-thinning empirically substantiates this claim and consistently shows accelerated stand dynamics with increasing stand fertility (Elfving, 2010).To understand the effects of this, we consider two stands with different fertility levels.Before canopy closure occurs, the productivity of trees is positively correlated with the site fertility as expected and no bias occurs.What happens next depends on the sampling protocol selected.The number of trees in the fast-growing population drops much more rapidly than that of the slow-growing population, producing a decline in the typical productivity of the metapopulation as the weightings shift.Thus, increases in the patch-scale variability in fertility tend to increase the bias towards lower productivity trees, producing an artificial positive trend in the time signal.
Ecophysiological limitation driving tree mortality is widespread and lead some to suggest that senescence and physiological limitations may be delayed in slow-growing trees or those on unproductive sites (Chao et al., 2008;Briffa and Melvin, 2011;Brienen et al., 2012b).Stephenson et al. (2011) presents an excellent overview of the major effects of this type by dividing them into four relevant hypotheses: the enemies hypothesis, the growth-defense hypothesis and the growth-hydraulic safety hypothesis and the shade tolerance hypothesis.In the enemies hypothesis, natural enemies are more common to highly productive trees due to their higher energy and nutrient concentrations.The growth-defense hypothesis use the idea of resource limitation; if a tree is using resources to produce growth (especially above-ground biomass), it can't spend them on resource diversification or natural defenses and is thus more likely to die to disturbances or environmental stress.The third hypothesis, the growth-hydraulic safety hypothesis similarly suggests that resistance to hydraulic failure is costly due to increased resistance to water transport and thus trees which invest in proper hydraulic architecture will survive longer than their peers, at least after competitive effects have reduced.Finally, the shade tolerance hypothesis relies on the common trend of reduced growth potential in shade-tolerant and shade-grown trees due to ontological choices in leaf anatomy and biochemistry.While these effects are commonly discussed in terms of between-species differences, there is some evidence to suggest that the genetic variability and phenotypic plasticity present within a species is sufficient to create small effects of this sort (Rötheli et al., 2011).

4509
Ecological disturbances are the final effect shaping survivorship curves, although their effects seem even more context-sensitive.Insect damage, disease, drought and frost mortality, windthrow, herbivory, floods, landslides, fire and harvest are all examples.These events are stochastic and significantly more challenging to model and their preference for trees of different ages and growth rates varies by disturbance type.Some of these events are fast-biased, either due to their disproportionate effect on small trees like herbivory and ground fires.Others, principally windthrow, are more likely to affect larger trees, biasing the survivorship curve towards slow-growing trees.Still other events, catastrophic ones such as landslides, stand-replacing fire, floods and land use conversion don't discriminate at all in terms of growth rate, size or age.Finally, the frequency of disturbance, especially harvesting, may increase with site productivity, limiting the availability of fast-growing old trees.

Big-tree selection bias
In this framework, big-tree selection bias is quite simple to explain.Large trees are commonly selected for in dendrochronological sampling, in hopes of sampling old trees with long records and to avoid the logistical difficulties associated with coring very small trees.In the case of a minimum diameter cutoff (ignoring the effects of temporal forcing): As the tree grows with age, the required minimum value for I becomes less stringent, leading to higher I values observed for young trees.Even without a strict minimum diameter, there is a bias towards the selection of highly productive young trees (or simply old trees) as long as large total diameter is desired.Because diameter depends on the inherent productivity of a given tree, high I values are still more important for young trees than old ones, assuming that some fraction of local trees are sampled at each site or subsite.
Big-tree selection bias is not, per se, a concern in even-aged stands.Even though only the largest and most productive trees are selected, there is no bias by age as the experimenter can only choose between trees of the same age at any given point in time.As Brienen et al. (2012b) stated, big-tree selection bias can be eliminated through careful experimental design.This is only a small part of the larger modern sample bias problem however, which must be corrected by accounting for the productivity of each individual tree.

Estimating growth models
The models introduced in Sect. 2 are quite simple.In all cases, the observed growth data is the product of some number of effects (individual, tree or age driven).Each effect vector is a latent categorical variable, with a seperate coefficient for each unique tree, year or age.There are 8 possible models, each with a unique combination of effects, increasing in complexity from the null model ( The simplest approach is to find the maximum likelihood solution to the growth model of choice.The error term corresponds to the probability density function used during this process.Once an optimal family is found (using simulated annealing for example), the effect vectors are rescaled to the desired form.
The other approach is to find an approximately optimal solution using the method of moments.The simplest case is a model with only a single effect vector (say T ).For a normally distributed error term, we can estimate the coefficient T t at each year by taking the arithmetic mean of the observed growth values for that year.If we assume a log-normal multiplicative error term, we need to use the geometric mean instead.
The problem of estimating growth models from tree ring data (standardization) can be framed as a regression problem.For the additive normal error term, this is a nonlinear regression with multiplicative categorical variables.In the case of the multiplicative log-normal error term, this is a generalized linear model with a log link function and 4511 no intercept (equivalent to log-transforming the growth model and using traditional linear regression).The "rescaling" that we discussed at the end of Sect. 2 is a dummy variable trap.Because we have multiple latent categorical variables, we can shift the (log)-intercept of each effect as long as it's counterbalanced by a shift in another variable.The choice to rescale the observed coefficients such that the typical value of I and T are 1 corresponds to a special type of contrast.Because this is a regression problem, estimating the coefficients by the methods of moments, least squares or maximum likelihood will all give similar results.A regression modelling perspective to tree-ring standardization immediately suggests some helpful metrics.If we want to understand the goodness of fit, we can look at the likelihood or R 2 of the model fit.To examine the noise level more directly, we can examine the estimated variance parameters (σ) in our error term.This is equivalent to the root-mean-square error (RMSE) for the additive normal error term.Comparing R 2 or σ between error families (probability density functions) is not directly meaningful.Instead, the most appropriate way to decide on the probability density function is to examine the residuals of the model.Using histograms, kernel density estimators or quantile-quantile plots, the residuals should match the desired probability density function.In most cases, visual inspection is the simplest and most reliable approach.The final, and perhaps most important decision we need to make is which of our 8 (16 if we need to choose an error family as well) is the most appropriate for our data.Simple measurements of goodness of fit, such as likelihood, RMSE or R 2 are not suited to this task.Because the models are nested, adding an additional term will always improve the goodness of fit.The use of these metrics for model selection leads to overfitting.Instead, we need to penalize the use of additional degrees of freedom, as in adjusted R 2 or information criteria do this.Adjusted R 2 has a familiar interpretation.It has the same behaviour as R 2 , in that it increases to a limit of 1 as the model explains all of the obseved variability but includes a term penalising the use of additional linear predictors.Adjusted R 2 however obscures the relative strength of evidence for each competing model (Burham and Anderson, 2002, p. 94-96).Information criteria (mostly Akaike's information criteria and Bayesian information criteria) use an information-theoretic approach to model selection, balancing the likelihood of each model against the risk of overfitting given a certain number of parameters.Information criteria can be converted into models, which accurately convey the level of support for each competing model, and should be used to understand the relative support of each model.
In some cases, it's not feasible to manually examine the residuals of each of the fit models (as in the case of massive meta-analyses).We can use likelihood (and by extension information criteria) to compare models drawn from competing probability density functions.A model with the true probability density function will tend to have a higher likelihood than the corresponding model with an incorrectly specified probability density function.The use of information criteria will reflect these differences in likelihood, choosing the correctly specified model over competitors.In most cases though, this approach should complement, rather than replace, the visual inspection of residuals.Likelihood approaches to error family selection does not reveal subtler distinctions such as skew or heteroscedasticity which may suggest the use of more sophisticated probability density functions.
The final benefit of these models is the ease of obtaining confidence intervals for the effects estimated.Frequentist confidence intervals, or likelihood based support intervals can be easily extracted from various premade regression packages.One element of each of the effect vectors (typically the first) will not have an estimated confidence interval due to the degree of freedom lost by rescaling the effects.In terms of traditional categorical regression, it is the baseline level.In the case of a normal error term, these intervals need to be scaled with the effects vector as they are converted to their standard form.

Regional curve standardization and flat detrending
As intended, regional curve standardization fits nicely into this growth model framework.When regional curve standardization is performed, we estimate the model (using 4513 a additive form and normal error term to demonstrate): Similarly, the raw chronology is constructed by fitting the following model: Less obviously, fixed effects standardization is also an extension of flat detrending techniques (Cook and Kairiukstis, 1990).If we divide by the mean of each series (the flat detrending line) and then construct the standardized chronology, we're estimating the individual tree and time effects.
The use of a normal error term implies the use of an arithmetic mean.If we want to assume a log-normal error term, a geometric mean should be used instead.But if, for example, regional curve standardization follows a growth model, why must we estimate the age effect first and then the time effect?Do we obtain the same result if the process is reversed?Trivially, the answer is no, as they will be scaled differently.But even when we account for that, the answer is typically no; the order in which the effects are estimated determines the family of effect vectors that is produced.
The reason for this is that standardization, by and large, is done sequentially, rather than simultaneously.In the case of a balanced design (the weight matix W is constant), sequential estimation of the effects works properly (Fig. 3).Removing an estimated effect does not bias our estimation of the other effects because the changes are symmetric across all levels of the other effects (see Appendix A).When the design is balanced, sequential estimation of the effect vectors by taking the mean at each index is an unbiased estimate of the true effect vectors.The family of solutions found is the same regardless of which effects are estimated.
Unfortunately, this is virtually never the case when dealing with real tree ring data.
In order to have a balanced design (for regional curve standardization), a chronology would need to have a completely uniform sample depth by age and time.Unbalanced designs are a fact of life in dendrochronology.When analyzing an unbalanced design sequentially (as in flat detrending or regional curve standardization), changes in one effect vector (such as age), result in changes in the estimate of every other effect vector (such as time).The signals are convoluted and cannot fully be seperated.
In fact, this very problem was identified in the work on signal-free standardization referred to as "trend-in-signal bias" (Melvin, 2004;Melvin and Briffa, 2008;Briffa and Melvin, 2011).The solution they proposed was signal-free standardization.By repeating the standardization process, they eventually produces stable estimates of the effects that did not suffer from trend-in-signal bias and better retained low-frequency variability.With slight modifications, signal-free standardization can be expanded to work with models with more than two effects, estimating each in sequence repeatedly (Appendix A).When we do so, we can prove that signal-free standardization converges to an unbiased estimate of the growth model, and properly handles unbalanced designs (Fig. 3).
The power of signal-free standardization comes from its effectively simultaneous estimation of the effects of interest.Because it converges on a unbiased solution to the growth model, the results of signal-free standardization are approximately equivalent to the growth models estimated using more conventional optimization techniques.

Smooth and parametric age effects
Conspiciously absent from all the preceding discussion is the common practice of smoothing the regional curve.In part, this was for convenience.The solutions are much simpler and the symmetry more obvious when age is treated as a categorical variable like time and individuals.But in truth, it is because a much more elegant solution exists.Rather than fit a smooth curve (paramateric or nonparametric) to the raw regional curve each time it is estimated, we can simply include a smooth age effect in our model directly (see Bontemps and Esper, 2011 for an example of this principle).A modified negative exponential regional curve for example (Fritts, 1976), would be included by 4515 fitting the following model to your data: In the case of a log-normal probability density function, a smooth flexible age trend can be fit using generalized additive models (see Beck and Jackman, 1998 for a gentle introduction).Generalized additive models use either splines or kernels to ensure that the fitted curve is locally smooth.The model used can be represented as: Where s(a) is a smooth function of age.As before, we can distinguish between these competing models of standardization using model selection criteria such as AIC.Breadth of imagination and biological plausibility are the only constraints, so long as an appropriate optimization algorithm can be found.

Likelihood ridges in three-effect models
A final complication arises when we attempt to estimate the three-effect (individual, time and age effects) for real tree-ring data.Realistic tree ring data is arranged along time-age diagonal lines in the tree-ring array.Each year that the tree ages, the calendar year advances by one.Because of this structure, a peculiar ridge emerges in the likelihood of three-effect models (Appendix B).For each potential set of parameters ( I, T and A) there exists a log-linearly related family of solutions ( I, T and A) that produces the same set of expected values ( G = G), and hence has the same likelihood.When all three effects are included, each element of the effects vector is scaled by a constant effect m is raised to a power related to the age (a), year (t) or birth year of the tree (b).Our tree-ring data is stored in a tree ring-array so we can think of this in terms of columns (C, corresponding to age) and rows (R, corresponding to time).
By making a priori decisions about the expected distribution of effects, we can distinguish between the "equivalent" solutions by likelihood.In practice, this post-hoc correction seems to work quite well but further investigation and parametrization of plausible effect distributions is still needed.

Explored published ring-width data
The technniques above leave us with four major questions as to their real world impact: 1. What growth model is most appropriate for tree-ring width data?Which effects (individual, time and age), form (additive or multiplicative) and error term (normal or log-normal) should be used?
3. What is the typical effect of modern sample bias?How much variability is there?
To answer these questions, we turn to the extensive International Tree-Ring Data Bank, version 702 (Grissino-Mayer and Fritts, 1997).6997 tree ring width data sets were available, of which 5609 (80.2 %) could be read into R. From the remaining 5609 data sets, 1200 were selected at random for analysis.2 of these were excluded from the analysis due to computational constraints (not enough RAM to fit GAM models 4517 on these extremely large data sets).In total, these data sets contained 33 657 series, 6 905 445 unique measurements and spanned from 476-2008 AD, with the bulk of the data occuring after 1500 (see Fig. 4 for an illustration of sample depth across time).Information on the data sets themselves, code used and individual results is available on request.
From our discussion above, we can already rule out sequential standardization techniques, as they are strictly worse than their signal-free (or maximum likelihood or least squares) equivalents.Furthemore, we know that an additive model suggests a normal error term, while a multiplicative model suggests a log-nromal error term.In fact, fitting a model with a poor match of form and error term often results in a computation error, such as taking the log of a negative value!Thus, we are left with 16 candidate models (2 3 combinations of effects times 2 model forms), ranging from the null model to the three effect growth model.
Each of these data sets was analysed using fixed effects standardization under each of the 16 competing growth models.As pith offset data was missing for most of the data sets, it was assumed that the first year recorded for each series was the pith of the tree.Because of the inaccuracy in the ages of each ring-width record, this analysis presents an overly pessimistic outlook on the value of including an age effect in treering standardization and may impart an additional negative trend in the time effects of models that do not include an individual effect (Esper et al., 2003;Briffa and Melvin, 2011).
The average goodness-of-fit and model selection criteria were computed for each model and are listed in Table 1.Goodness-of-fit was fairly high, with the average R 2 for the best model for a particular chronology of 0.63 ± 0.10 dropping to 0.57 ± 0.11 for adjusted R 2 .Noise (in terms of the variance geometric standard deviation of the residuals) was around 0.39 ± 0.12 for the best model.The competing model selection criteria (σ, R 2 , adjusted R 2 , likelihood, AIC, AICc and BIC) were somewhat consistent, with σ, R 2 and likelihood preferring the most complex models, BIC preferring less complex models with adjusted R 2 , AIC and AICc intermediate (Fig. 5).By and large, it seems that σ, R 2 and likelihood are too liberal (as they have no penalty for added complexity) while BIC may be too conservative in many cases.Of the remaining three choices, adjusted R 2 is neither as powerful or interpretable for model selection purposes as AIC, and AICc is simply a more correct version of AIC.R 2 , adjusted R 2 and σ are also inconsistent across model forms, as they are calculated in different ways depending on the assumed probability density function for the error term.In general, we recommend that model selection in tree ring standardization should be carried out using AICc or BIC (or AIC, as the correction as neglible for typical sample sizes) to allow for the use Akaike model weights.R 2 and σ are still useful and interesting descriptive statistics though, and should be reported for each model as well.
For tree-ring width data, it appears that a multiplicative log-normal error term and model form is far more realistic than the additive normal equivalent.The fair likelihoodbased model selection criteria preferred a multiplicative model over the additive equivalent 100 % of the time.On a smaller scale, this fact is easily confirmed by viewing histograms and quantile-quantile plots of the residuals of the fitted standardization model.The choice of probability density function is essential when significance tests, confidence or support intervals are desired.
Within these log-normal models, the flat detrending model was the most popular choice (G = IT , P(AICc) = 83.5 %, P(BIC) = 53.8%), followed by the full model (G = ITA, P(AICc) = 14.4 %, P(BIC) = 0.4 %), and a productivity-only model (G = I, P(AICc) = 2 %, P(BIC) = 45 %).The traditional regional curve standardization model, G = TA was selected only once by AICc, and never by BIC.BIC also selected the unstandardized chronology (G = T ) 5 times, and a strange time-insensitive model (G = IA) 10 times.To investigate the effects of modern sample bias, we generated two competing chronologies.In the first (uncorrected) case, the growth model accounted for the effects of time and age using a smoothed age effect and a log-normal error term: The null model was never selected by any model selection criteria, suggesting that null hypothesis tests of tree-ring standardization are almost certainly a foregone 4519 conclusion.Overall, the individual effect had the strongest support, followed by the time effect, with the age effect being weakest across the chronlogies as a whole.
The second (corrected) model used adds an effect for the individual productivity of each tree: This model was estimated using a generalized additive model with a gamma family and a log-link in mgcv (Wood, 2001).Smoothing was performed using thin-plate basis splines, with stiffness automatically selected using generalized cross-validation.
The corrected model was preferred 99.9% of the time by AICc, with even BIC preferring it 99.3% of the time (Fig. 6).Model fit statistics were substantially improved for the corrected three-effect model (Table 2).
We can detect the effect of modern sample bias by comparing the ratio of the uncorrected to the corrected age and time effects.For the chronologies sampled, it appears that the sign of modern sample bias varies by chronology.For 355 chronologies (29.5 %), the ratio between the uncorrected and corrected time effects had a positive trend (by Spearman's ρ test for trend), reflecting a bias towards increasing growth by time.The remaining 847 chronologies (70.5 %), displayed a negative trend, indicating a systematic underestimate of growth rates in more modern years (Fig. 7).Plotting the typical values of this ratio by time confirms this, revealing a slow, persistent negative bias in uncorrected chronologies over time, accelerating after about 1700 (Fig. 8).

Discussion and conclusions
Modern sample bias has been shown before (Briffa and Melvin, 2011;Voelker, 2011;Cooper et al., 2012;Melvin et al., 2012b) but the techniques developed in this paper allow a more detailed and comprehensive examination and correction.Contrary to prevailing opinion (Brienen et al., 2012a), modern sample bias does not always impart a positive bias on the standardized chronology but depends instead on the complex ecological interactions dictating survival and the vagaries of sampling.In fact, in 70.5 % of the chronologies analysed, it had a negative effect instead.The ecology and sampling design behind these patterns is interesting in its own right and deserves much finer scale, context-specific study than given here.
D 'Arrigo et al. (2008) suggest that modern sample bias may be responsible for the "divergence problem" in dendroclimatology, the widespread reduction in temperature sensitivity of tree-ring chronologies in recent decades.The generally negative trend induced by modern sample bias in recent years certainly suggests that this may be at least part of the problem.
More generally, the theoretical results of this paper clarify, simplify and extend regional curve standardization.Regional curve standardization is a biased implementation of signal-free standardization, while signal-free standardization is itself equivalent to the new effect regression standardization.Working within a regression framework improves the transparency of the standardization process, allows investigators to use classical regression tools such as AIC and, as demonstrated, facilitates investigation of alternate underlying models of tree growth.
The estimates of I are interesting in their own right, as measurements of tree-level productivity, independent of the effects of tree age and time period.This metric possesses several large advantages over traditional metrics of forest productivity such as site index or direct measurements of annual net primary productivity.While tree ring data may be more laborious than simple DBH and height measurements, it is still relatively cheap and simple to collect and analyze.Most importantly, it is free of confounding age-and time-related effects, a major challenge in most attempts to qualify site or tree quality.Integration of dendrochronological data into more traditional investigations of forest productivity (cf.Pokharel and Dech, 2012) could help control for these confounding effects.Intriguingly, I describes the productivity of individual trees.As a result, 4521 questions that are cost-prohibitive or impossible to answer using alternate productivity indexes are now feasible.The spatial structure of tree productivity, the effects of microtopography, or the success of tree breeding programs could all be readily quantified from tree ring data using the techniques outlined in this paper.More extensive empirical validation and testing of the assumptions discussed in this paper is still needed and requires the detailed investigation of many real chronologies.
To this end, R source code is available as Supplement and is planned for inclusion in dplR, the main dendrochronology package in R shortly (Bunn, 2008).Much of the existing work surrounding variance stabilization and chronology construction (Boreux et al., 2009;Cook and Peters, 1997;Nicault et al., 2010;Bontemps and Esper, 2011) may be significantly clearer in this model-based likelihood framework.
The tools presented in this paper may be useful in other areas of dendrochronology.Tree-specific differences in isotope values have been observed (Hangartner et al., 2012), and while maximum latewood density and other tree-ring proxies are relatively stable with age (Melvin et al., 2012a), microclimatic differences between trees may still drive persistent trends.The growth model presented here suggests that these effects may be segregated, even when using measurements other than ring width, so long as there is reason to believe that persistent effects due to individual trees, calendar year and age are present.Hangartner et al. (2012) encountered this problem, although it was not recognized as such, when attempting to combine partially overlapping tree ring isotope series of differing contemporaneous magnitude, observing the characteristic "jumps" in the final chronology when series began or ended.The approaches examined in that paper are not entirely applicable here; the first three suffer from "segment-length curse" (Cook et al., 1995), masking much of the low-frequency variability, and the fourth is difficult to extend to more than two overlapping series.Brienen et al. (2012a) show that modern sample bias presents a real challenge to the use of modern tree ring chronologies in climate reconstruction and the detection of long-term shifts in growth, such as those that may result from climate change or carbon fertilization.Differences in productivity are at the root of these problems, and the techniques introduced here allow us to control for this heterogeneity and thus eliminate this bias.
Fixed effects standardization builds on a long history of dendrochronological standardization techniques but is at once simpler and more flexible.Sequential estimation techniques (classical regional curve standardization or individual series detrending) are grossly and needlessly ineffecient and need to be replaced by their signal-free, maximium likelihood or least-squares equivalents in every case.We recommend the use of fixed effects standardization that accounts for differences in productivity in place of classical regional curve standardization in virtually all cases, removing trend-in-signal bias, differing-contemporaneous-growth-rate bias and modern-sample bias.centered around 0 but in the multiplicative case they will be (multiplicatively) centered around 1 instead.We can find the estimated time effect by substituting the updated growth matrix into Eq.(A4), remembering that the weight matrix is a constant (C) in this scenario.
Because we can never know the true values for the effect vectors, we can take the total (and thus average) error over the entire growth array ( N ita ) to be 0.
The estimate of the time effect is again centered and looks very similar to the estimate of the age effect.It is related to the true time effect vector by an unknowable constant offset and contains noise from the data in the corresponding year.As above for the regional curve, the standardized chronology produced using regional curve standardization is an unbiased estimate of the time effect if and only if the weight matrix is balanced.Furthermore, in this case switching the order in which the effects are estimated makes no difference to the family of effect vectors estimated.We can confirm that we've obtained reasonable estimates of the effect vectors by substituting them 2. Divide the raw growth data by the regional curve at the same age to produce signal-free series.
3. Find the mean values at each year to produce the signal-free chronology.
4. Repeat steps 1 through 3 until convergence is reached (defined as an approximately zero-variance signal-free chronology).
5. Use this final signal-free regional curve to produce the standardized chronology by dividing the original growth data by the final regional curve.
Instead, we argue that signal-free standardization (as a whole) can be performed using this simpler algorithm: 1. Initialize the working growth data as the original growth data and the estimated effects as the null value (0 for additive models, 1 for multiplicative models).
2. Estimate an effect (via method of moments (taking the mean), least squares or maximum likelihood) and combine it with the previously estimated effect of that type (by adding for additive models and multiplying for multiplicative models).
3. Remove the change in the effect from the working growth data (by subtracting or dividing it from the growth data).
4. Repeat steps 2 and 3 for each orthogonal effect (orthogonal effects have independent sets of predictor variables).
5. Repeat steps 2-4 until convergence is reached, as defined by a zero-signal working growth data (all values of growth are ≈ 0 for additive models or ≈ 1 for multiplicative models).
Rather than retaining only a single estimated effect at a time (the regional curve/age effect above), we can store our updated estimates of each of the effects and skip the final step where we derive the second effect from the first estimated effect and the 4529 growth data.In this way, we can extend signal-free standardization to include more than two orthogonal effects.
For an additive regional curve standardization growth model with normal noise, we can write the algorithm more concretely.
1. Initialize the working growth data as the original growth data and the estimated effects as 0. Start the step counter at 0.
2. Estimate the age effect by taking the average of the working growth data by age and add it with the previously estimated effect age effect. A 3. Remove the change in the age effect from the working growth data by subtraction.
Increment the step counter by 1.
4. Estimate the time effect by taking the average of the working growth data by time and add it with the previously estimated effect time effect.
5. Remove the change in the time effect from the working growth data by subtraction.Increment the step counter by 1.
6. Repeat steps 2-5 until convergence is reached, as defined by the working growth data reaching 0 in every locations.
Sequential standardization (regional curve standardization, flat detrending) simply stops this process after one loop through.At each step, the working growth data and the most recent effects combine to produce the original growth data.
All information is retained at each step.We can confirm that the effect vectors stabilize at our convergence condition by substituting G = 0 into the updating equations.

4531
The next question is whether the proposed algorithm actually converges to the stopping point.Unfortunately we have not managed, to date, to prove this and welcome proofs to this effect.In practice, the algorithm converges to a stable family of solutions (as determined by comparing the residuals at each step) relatively quickly.Small models and data sets typically take only a 2-5 iterations to stabilize while large models and particularly sparse data sets can take up to 20.Model fit statistics confirm our intuition that the modified signal-free algorithm is a good estimator of the growth model, producing values of likelihood and R 2 extremely close to those obtained via maximum likelihood or least squares optimizers.
Finally, by Eq. (A35), we know that if the working growth data is 0, the estimated effect vectors are an unbiased estimator of the true effect vectors (with an unknowable constant offset).
Note that the family of solutions found does not depend on the order in which the effects are estimated, unlike in sequential standardization.Because this process is guaranteed to converge (at least for categorical effects), signal-free standardization results in an unbiased least-squares estimate of the original growth model, even in cases where the weight matrix is unbalanced.Signal-free standardization produces least squares estimate for first one effect then the next, to finally converge on a least squares solution for the full model.

Likelihood ridges in three-effect models
As we saw before, the fixed effects growth models are singular.There is no single best solution, but instead a family of them.We can deal with this problem by rescaling the coefficients into a standard form.But a similar, less trivial difficulty arises when fitting three-effect (individuals, time and age) models to real tree ring data.
Because each sequential tree ring is both one year older and one year later in time than the preceding one, the tree-ring data for each tree follows a single diagonal along time and age.As a result, the estimated effect vectors can shift away from the true effect vectors in a peculiar way.In this section, we'll use an additive model with no noise for clarity.
We can think of the slice of the tree-ring array in which a single tree is found as a matrix, with calendar years as rows (R) and ages as columns (C).Each tree was born in a particular year, determining the diagonal (k ) in which the data is found.The oldest tree will be found along the main diagonal, starting at (t 1 , a 1 ), while younger trees will be found in the lower left triangle.The upper right triangle will always be empty unless the chronology contains trees that were born before the oldest complete tree but are missing data near the pith.The relative birth year of tree i (b i ) can be related to the year and age (counting from the beginning of the chronology) in which any particular ring is found.
The diagonal of the data is clearly related to the rows and columns in which the data is found.
And so then we can link the two perspectives: We can change the coefficient of I i without affecting any of the other diagonals, the coefficient of T t without affecting any of the other columns and the coefficient of A a without affecting any of the other rows.Suppose there is a demonic intrusion (Hurlbert, 1984), that results in each of our estimates of the effect vectors being linearly offset from the best estimate of our effect vectors by an amount related to the birth year, calendar year or age they represent.Let us describe these perturbed effect vectors as I, T and A for individuals, time and age respectively.These perturbed effect vectors would look like: Which can of course be converted into the matrix perspective: We can detect this intrusion using least-squares, maximum likelihood or similar if it results in a poor model fit.We can compare the pure predicted growth values ( G) to the corrupted predicted growth values ( G) by looking at the difference between them.
The best unbiased estimates of our effect vectors cancel: We can detect the perturbance if the true and corrupted predictions of growth are different, as the corrupted effect vectors will (by definition) provide a worse fit to the data.Now, if the intruding demon was particularly malicious, it would add precisely the right amount to each element of the effect vectors such that the predictions of the two model fits were indistinguishable in every cell of the growth data.In that case, the difference between the predictions ( G − G) would be 0.
It turns out, that if you set the slopes of the perturbations up just so, the intrusion is undetectable.
Setting m to be the arbitrary slope of the perturbation as a whole, we can confirm that the predicted values are identical.
Solutions to the three-effect growth model of the form given in Eq. (B18) all have the same likelihood, regardless of the value of m chosen.This creates a likelihood ridge in our solution space.Any residual-based optimizer we choose to use cannot distinguish between the "true" and "corrupted" solutions, leaving us unable to interpret the reported effects.Residual-driven metrics of model fit for these models are however perfectly valid, the R 2 , likelihood, AIC and BIC for these malformed models are all correct, allowing us to perform model selection even in the face of demonic intrusions.
Similar models, including flexible GAM standardizations and many parametric alternatives (including constant basal area increment and negative exponential age trends) fall prey to the same problem as they must ultimately generate predictions at the same points.Resolving this problem is vexing, but we can use the sheer implausability of the reported outcomes in our favour.
In many cases, we can assume that the true effect vectors themselves arise from a specified distribution (in this case normal).The demonic intrusion dramatically skews these distributions, leaving clear proof of its visit.The three effects vectors are each generated through a different process and thus deserve a seperate random process, with different levels of variability.From the corrupted effect vectors, we can obtain candidate corrected effect vectors ( I, T , A) for any hypothesized level of corruption ( m).
When m = m, we will have successfully removed the effects of the demonic intrusion, even though the likelihood of all the models are equivalent.For a given value of m, we can estimate the likelihood of a single corrected effect coefficient according to which Allowing the mean of each normal distribution to be flexible with the estimated effect lets us accomadate the arbitrary scale of any one effect vector.We can compute the overall likelihood of the model as the product of the likelihood of each of the components of the effect vectors.By maximizing this quantity, or rather its logarithm, we can obtain reasonable values for each of our effect vectors that match our distributional expectations.At this point, all that remains is a fairly trivial one-dimensional optimization.
In practice, this post hoc correction appears to work fairly well.It performs admirably on data sets in which the true effect vectors follow the expected distribution.Its application to real data is rather more approximate, the use of more specialized or empirical distributions for individual, time and age effects would help to ensure its reliability.Tree ring data is traditionally stored in a two-dimensional table (top left), with each column representing a different core and each row a different year.When dendrochronologists, wanted to look at the values by age, as when constructing the regional curve, the series need to be realigned so that the rows now match with the age of the rings (bottom left).Instead, we can store information about both time (calendar year) and age of the data in a three-dimensional tree ring array, here shown in slices by core (right).The sign of this ratio corresponds to the direction of bias induced by modern sample bias in each year.The black line show the median value while the dark and light grey bands show the 40-60 % and 20-80 % quantiles respectively.This captures the typical values and their spread in a way that is consistent as the number of chronologies changes.
class it belongs to:L( I i ) ∝ P( I i | N (µ I , σ I )) (B23) L( T t ) ∝ P( T t | N (µ T , σ T )) (B24) L( A a ) ∝ P( A a | N (µ A , σ A )) (B25) Fig.1.Tree ring data is traditionally stored in a two-dimensional table (top left), with each column representing a different core and each row a different year.When dendrochronologists, wanted to look at the values by age, as when constructing the regional curve, the series need to be realigned so that the rows now match with the age of the rings (bottom left).Instead, we can store information about both time (calendar year) and age of the data in a three-dimensional tree ring array, here shown in slices by core (right).

Fig. 4 .Fig. 5 .
Fig. 4. Sample depth of the analysis across time (number of series analysed in each year).

Fig. 6 .Fig. 7 .Fig. 8 .
Fig. 6.Frequency of model selection over the analysed data set for the corrected (G = ITA) and uncorrected (G = TA) GAM standardization models by various model selection criteria.The y-axis shows which terms are included in the growth model.

Table 2 .
Average model fit statistics for the GAM models fit to the analysed chronologies.∆ * IC values are calculated relative to the full multiplicative model.