Application and evaluation of the dendroclimatic process-based model MAIDEN during the last century in Canada and Europe

Tree-ring archives are one of the main sources of information to reconstruct climate variations over the last millennium with annual resolution. The links between treering proxies and climate have usually been estimated using statistical approaches, assuming linear and stationary relationships. Both assumptions may be inadequate, but this issue can be overcome by ecophysiological modelling based on mechanistic understanding. In this respect, the model MAIDEN (Modeling and Analysis In DENdroecology) simulating tree-ring growth from daily temperature and precipitation, considering carbon assimilation and allocation in forest stands, may constitute a valuable tool. However, the lack of local meteorological data and the limited characterization of tree species traits can complicate the calibration and validation of such a complex model, which may hamper palaeoclimate applications. The goal of this study is to test the applicability of the MAIDEN model in a palaeoclimate context using as a test case tree-ring observations covering the 20th century from 21 Eastern Canadian taiga sites and 3 European sites. More specifically, we investigate the model sensitivity to parameter calibration and to the quality of climatic inputs, and we evaluate the model performance using a validation procedure. We also examine the added value of using MAIDEN in palaeoclimate applications compared to a simpler tree-growth model, i.e. VS-Lite. A Bayesian calibration of the most sensitive model parameters provides good results at most of the selected sites with high correlations between simulated and observed tree growth. Although MAIDEN is found to be sensitive to the quality of the climatic inputs, simple bias correction and downscaling techniques of these data improve significantly the performance of the model. The split-sample validation of MAIDEN gives encouraging results but requires long tree ring and meteorological series to give robust results. We also highlight a risk of overfitting in the calibration of model parameters that increases with short series. Finally, MAIDEN has shown higher calibration and validation correlations in most cases compared to VS-Lite. Nevertheless, this latter model turns out to be more stable over calibration and validation periods. Our results provide a protocol for the application of MAIDEN to potentially any site with tree-ring width data in the extratropical region.

offer a longer-term perspective. In this context, dendroclimatology, defined as the science that allows for the inference of past climates from tree rings, enables climate reconstructions at high temporal resolution (annual) over several centuries or millennia (Fritts, 1976;Hughes et al., 2011). Thanks to the availability of tree-ring observations in many regions, they represent the main data source in most largescale hemispheric reconstructions covering the last millennium (e.g. Cook et al., 1999;Jones et al., 2009;Mann et al., 2009;Wilson et al., 2016;Anchukaitis et al., 2017;PAGES 2k Consortium, 2017;Esper et al., 2018;St. George and Esper, 2019).
Reconstructing past climate on the basis of tree rings first requires us to establish a relationship between measured variables, such as tree ring width or density, and climate. This has been classically done using statistical approaches (Fritts, 1976;Cook and Kairiukstis, 1990) and often reducing the problem to empirical linear relationships. Consequently, numerous temperature reconstructions are based on multiple linear regressions, calibrated using temperature during the instrumental period (e.g. Fritts, 1991;Jones et al., 1998;Mann et al., 1999Mann et al., , 2008. When using those statistical models for the entire period covered by dendroclimatic data, we assume both linear and stationary relationships , while those assumptions may be inadequate for many records Wilson and Elling, 2004;Wilson et al., 2007;D'Arrigo et al., 2008).
Process-based tree-growth models are able to overcome those limitations of statistical models by explicitly representing the processes at the origin of the recorded signal . They are also one kind of a larger group of models called proxy system models (PSMs). PSMs simulate the development of measured variables (here in tree rings) based on climatic variables as inputs. They integrate a simplified representation of the mechanisms governing the relationship between climate and observations used to capture palaeoclimatic information . These models can be applied in an inverse mode to estimate the climatic conditions that gave rise to the measured characteristics Boucher et al., 2014). Alternatively, they can be forced by climate model results (direct mode), thereby allowing us to compare model results with indirect climate records and without the need to reconstruct the climate from these observations Dee et al., 2016). In addition to major advantages for model-data comparisons, proxy system models can facilitate the assimilation of proxy data in long climate model runs (Dee et al., 2016;Goosse, 2016). In palaeoclimatology, the objective of data assimilation is to optimally combine the results of one climate model and the observations to obtain an estimate of the state of the climate system as accurate as possible (Kalnay, 2003). This technique is now used regularly to obtain reanalyses, providing estimates of different climatic variables, such as temperature, precipitation, and atmospheric and ocean circulation for the last decades. Similar procedures are being developed in palaeoclimatology (e.g. Goosse et al., 2012;Franke et al., 2017;Tardif et al., 2018). However, so far, physically based tree-ring PSMs have not been used in published reconstructions based on data assimilation using actual data. This implies additional uncertainties when reconstructing temperatures.
Several models developed to simulate tree growth have been applied in dendroclimatology . Among them, the VS-Lite model is a deterministic numerical model that simulates the primary response of ring width to climate based on the principle of limiting climatic factors (i.e. temperature and soil moisture; Tolwinski-Ward et al., 2011). Because of its simplicity and the small number of inputs required, it has been used in a wide range of conditions in a large number of palaeoclimate studies (e.g. Breitenmoser et al., 2014;Lavergne et al., 2015;Dee et al., 2016;Steiger and Smerdon, 2017;Seftigen et al., 2018;Fang and Li, 2019). However, VS-Lite is not able to reproduce treegrowth observations for numerous sites, particularly when the dependence on climatic conditions is complex (Breitenmoser et al., 2014). More comprehensive models such as the full Vaganov-Shashkin model (Vaganov et al., 2006) or MAIDEN (Modeling and Analysis In DENdroecology; Misson, 2004) could be more adapted to those conditions. One of the strengths of the MAIDEN model is to include the influence of atmospheric CO 2 concentration on growth. This is essential when we know that the atmospheric concentration of CO 2 increased by 30 % during the last 50 years (Myhre et al., 2013;Boucher et al., 2014). As models are calibrated over this recent period, not taking into account CO 2 concentration could potentially induce stationarity problems, which can, ultimately, have an impact on the calibration of parameters, such as the ones related to temperature or water use efficiency that can covary with CO 2 . Unfortunately, those more comprehensive models including explicitly complex biological processes such as photosynthesis and carbon allocation may need careful initialization and calibration for each set. They may thus require specific information on the sites that may not be available. This may then hamper a systematic application of the model to a large number of sites as done for instance with VS-Lite (Breitenmoser et al., 2014).
Before applying a mechanistic model to a wide range of tree-ring records covering the past centuries, testing its applicability over the 20th century when data allow for an estimation of the model skill appears necessary, which is the goal of our study. For a specific study site, local meteorological data and measurements of several ecophysiological variables allow for a precise calibration of many individual processes included in the model. However, this is a rare case and likely one of the main limitations in the application of the model to a wide range of sites and soil conditions or when driven by climate model results that have known biases (Flato et al., 2013). We first present in Sect. 2.1 the two dendroclimatic models that are compared in this study, namely the complex model MAIDEN and the more simple model VS-Lite. MAIDEN and VS-Lite are applied to selected sites of the Northern Hemisphere (described in Sect. 2.2), covering a range of environmental conditions and tree species. A first set of data consists of a large number of sites from the same region with similar environmental conditions but with low in situ replication, while a second set only contains a few sites but with good replication. In this way, we test the applicability of MAIDEN to two datasets contrasted in terms of site documentation. This allows us to evaluate the extent to which MAIDEN can be applied. We compare the calibration methods adopted for VS-Lite  and MAIDEN (Hartig et al., 2019) in Sect. 2.3. Different strategies to select the value for the most sensitive parameters of the MAIDEN model as well as the sensitivity of parameter calibration to the quality of climatic inputs are tested in Sect. 3.1, 3.2 and 3.3. Finally, we compare calibration and validation statistics of both models and discuss their applicability to a wide range of sites and species in Sect. 3.4 and 3.5.

Material and methods
2.1 Tree-growth models

The MAIDEN model
The dendroclimatic model MAIDEN has initially been developed by Misson (2004). It explicitly includes biological processes, namely photosynthesis and carbon allocation to different tree compartments, to simulate an annual tree-growth increment. The model uses daily climatic inputs (i.e. CO 2 atmospheric concentration, precipitation, and minimum and maximum air temperature). Up to now, MAIDEN has been applied in the Mediterranean (Gea-Izquierdo et al., 2015) and temperate regions (Misson, 2004;Boucher et al., 2014), in the Eastern Canadian taiga , and in Argentina (Lavergne et al., 2017). Currently, there are two versions of the model: from Gea-Izquierdo et al. (2015), developed for the Mediterranean forests, and , for boreal tree species. A unified version from those two versions has also been developed by Fabio Gennaretti (unpublished). In this study, all tests have been conducted using the unified version of MAIDEN. This unified version gives the opportunity to choose between the version from  or from Gea-Izquierdo et al. (2015) and, if needed, to test equations from both versions to evaluate their impact. However, here, only the version from  is used as it is the most adapted to the selected sites.
MAIDEN simulates photosynthesis on a daily basis and allocates the daily available carbon from photosynthesis and stored non-structural carbohydrates to different pools (leaves, roots, stem and storage). The allocation is based on functional rules defined following the ongoing phenological phase (five phases per year: winter 1 with no accumulation of growing degree-days (GDD), winter 2 with active GDD accumulation, budburst, summer and fall). At the end of the year, the model sums all the daily carbon inputs allocated to the stem to get an annual tree-growth increment (yearly Dstem, hereafter Dstem, in grams of carbon per square metre of stand per year). Dstem is assumed to be proportional to tree-ring growth so that we can build simulated tree-ring index time series and compare it with tree-ring width (hereafter TRW) observations (Sect. 2.3.1) (Gea-Izquierdo et al., 2015;. The structure of the MAIDEN model is provided online (https://figshare.com/articles/ MAIDEN_ecophysiological_forest_model/5446435/1, last access: 17 November 2019; , and its modules are available upon request. Tree-ring observation site and climate station (corresponding to a single location or grid cell as a function of the climatic dataset) constants of the MAIDEN model (Table S1) are derived from observations, as far as possible. For practical reasons, we were not able to retrieve slope and aspect information from a digital elevation model, for example, because it requires field knowledge of the site and for each sample, which we cannot systematically obtain given our global-scale goals. Thus, slope and aspect constants are set to zero. The soil is divided into four layers (1-15; 15-30; 30-65; 65-100 cm). Clay and sand fractions are extracted from the Harmonized World Soil Database (hereafter HWSD) v1.2 at 30 arcsec resolution (FAO/IIASA/ISRIC/ISSCAS/JRC, 2012) at the nearest cell with observed value, which is always at a distance smaller than 100 km to the site and assigned as follows: 1-30 cm parameters from the HWSD for the two first soil layers in MAIDEN; 30-100 cm parameters from the HWSD for the two deepest soil layers in MAIDEN. Soil layer thickness is fixed at the same value for all sites, as for fine root fractions.  (Vaganov et al., 2006). The model reproduces the primary response of ring width to climate using an approach based on the limiting factors principle (i.e. temperature and soil moisture) and on threshold growth response functions. It does not model any biological processes explicitly so it cannot be considered fully mechanistic. The model needs monthly climate data (cumulated precipitations and average temperature) as input as well as latitude of the study site. The main output of VS-Lite used here is a unitless annual tree-growth increment (Tolwinski-Ward et al., 2011).

Study sites
A network of tree-ring width chronologies of Picea mariana collected in similar conditions is available for the J. Rezsöhazy et al.: Application and evaluation of the dendroclimatic process-based model MAIDEN Eastern Canadian taiga (Nicault et al., 2014;Boucher et al., 2017; http://dendro-qc-lab.ca/trw.html, last access: 30 March 2019). We use here the tree-ring series directly derived from this compilation, without any modification. The chronologies have been previously standardized using the age-band regional curve standardization (or RCS) method proposed by Briffa et al. (2001) and further applied to a similar boreal dataset by Nicault et al. (2014). We also use the Eastern Canadian taiga chronology for Picea mariana from  (hereafter QC_taiga), standardized using a site-specific RCS (Gennaretti et al., 2014b). The latter is highly replicated (Gennaretti et al., 2014b) compared to the other Eastern Canadian sites from Nicault et al. (2014) and Boucher et al. (2017), which cover a broader spatial range, and provides additional observations to calibrate the model. From this network, we have only selected sites from Nicault et al. (2014) and Boucher et al. (2017) ending at least in 2000, with an expressed population signal (defined as the amount of variance of a population chronology infinitely replicated explained by a finite subsample; Buras, 2017) equal to or above 0.8, and replication equal to or above 15. We have also kept the site from  as a control site. At the end of the selection process, we get 21 sites (Fig. 1a). In order to increase replication, the Canadian sites from Nicault et al. (2014) and Boucher et al. (2017) are aggregated based on a 1 • grid by averaging tree-ring width chronologies (Fig. 1b). From this, we get five aggregated sites ( Table 1). Note that QC_taiga is not included into the aggregation process to keep it as a reference. The aggregation allows us to get relatively good intersite correlations inside the same 1 • grid, ranging from 0.442 to 0.732 with an average of 0.558. This observational network represents an archetypal example of a singular species that covers an important hydroclimatic gradient. Sites located along the western (near James Bay, WNFLV1, Fig. 1a) and eastern (near Labrador sea, WL32, Fig. 1a) margins of the study area present the warmest growing seasons in the network (864 growing degree-days above 5 • C for the 1976-2005 period; Hutchinson et al., 2009). Sites located in the centre of the Quebec-Labrador peninsula (WHM2, Fig. 1a) present a much shorter growing season (692 growing degree-days above 5 • C), much like the sites located further north (WLECA, Fig. 1a, 573 growing degreedays above 5 • C). Annual precipitation increases from west to east, passing from 668 mm (WNFLV1, Fig. 1a) to 907 mm (WL32, Fig. 1a), and significantly decreases with latitude, reaching only 567 mm at WLECA (Fig. 1a) for the 1976-2005 period (Hutchinson et al., 2009). This makes this network a relevant candidate for our calibration and validation exercises.
Three additional tree-ring width chronologies (hereafter European sites) are used to perform tests on sites with good replication, especially at the European Alps site, and long nearby series from meteorological stations ( Fig. 2): EALP (47 • N, 10.7 • E; 2050 m; European Alps; Pinus cembra and Larix decidua; Büntgen et al., 2011; processed data available in the PAGES 2k database; PAGES 2k Consortium, 2017); SWIT179 (46.77 • N, 9.82 • E; 1800 m; Picea abies; standardized with a cubic-smoothing spline with a 50 % frequency cut-off at 35 years; Seftigen et al., 2018; unprocessed data archived at the International Tree Ring Data Bank, https:// www.ncdc.noaa.gov/data-access/paleoclimatology-data, last access: 12 January 2019) and FINL045 (68.07 • N, 27.2 • E; Pinus sylvestris; standardized using a spline with a 50 % frequency cut-off response at 32 years; Babst et al., 2013; processed data available in the Supplement of Babst et al., 2013). Similarly to the Eastern Canadian taiga chronologies, the tree-ring series were not modified here. Those three European sites exemplify a situation where we only have access to individual sites with different species and from different environmental conditions that are not part of a larger network of tree-ring width observations.

Climate data
Daily climatic inputs are needed to run MAIDEN (Sect. 2.1.1). Monthly climatic inputs for VS-Lite are computed from those daily data. Note that monthly-average temperature has been computed by averaging daily maximum and minimum temperatures, which could lead to a small bias. Three daily climatic datasets with different spatial resolution (Table 2) were selected for our analysis on the Eastern Canadian taiga network ( Fig. 1a and b). First, a dataset at a high spatial resolution of 5 min from the gridded interpolated Canadian database of daily minimum-maximum temperature and precipitation (Hutchinson et al., 2009, hereafter NRCAN). The Global Meteorological Forcing Dataset for land surface modeling (v1) (http://hydrology.princeton.edu/ data.php, last access: 4 January 2019; Sheffield et al., 2006) at 1 • resolution is used as a mid-resolution climatic dataset (hereafter GMF). The NOAA-CIRES 20th Century Reanalysis V2c (https://www.esrl.noaa.gov/psd/data/gridded/ data.20thC_ReanV2c.monolevel.html, last access: 4 January 2019) at 2 • resolution is used as a low-resolution dataset (hereafter 20CRv2c). Finally, the 20CRv2c dataset was modified to match the monthly-mean seasonal cycle of the highresolution dataset NRCAN (hereafter 20CRv2c corr.). This simple bias correction and downscaling to the location of the site is done by removing the difference between the monthly-mean seasonal cycle of 20CRv2c (2 • ) and NRCAN (5 arcmin) from the maximum and minimum temperature data. In order to avoid negative values, daily precipitations are multiplied by the ratio between the monthly-mean seasonal cycle of NRCAN (5 arcmin) and 20CRv2c (2 • ). The time series are extracted from the grid cells nearest to the studied individual sites. The climatic data are averaged over the individual site data for the aggregated Eastern Canadian sites (Table 1).
The Global Historical Climate Network Daily (Table 2;  see Table S2 for details on selected stations; Menne et al.,   Fig. 2). Daily atmospheric CO 2 concentration data are linearly interpolated from the annual data from Sato and Schmidt (https://data.giss.nasa.gov/modelforce/ghgases/, last access: 3 December 2018).

The MAIDEN model
We have developed a protocol to systematically and automatically calibrate the model through a Bayesian procedure with Markov Chain Monte Carlo sampling carried out using the DREAMzs algorithm (Hartig et al., 2019). The calibration procedure focusses on the most sensitive parameters of the model identified in : 6 parameters influencing the simulated stand growth primary production and 12 parameters involved in the modelling of the daily quantity of carbon allocated to different tree compartments (Table S3). Those 6 + 12 parameters are calibrated by comparison between simulated Dstem and tree-ring width observations. The comparison relies on the computation of the model likelihood defined as the sum of the logarithms of the normal probability densities of the residuals between the model simulation and the observations. The prior distributions of the 6 + 12 parameters are assumed to be uniform over an acceptable range, as in . The calibration procedure is made up of three steps. During the first step, we calibrate the 12 carbon allocation parameters, while fixing the 6 photosynthesis parameters to arbitrary values in their acceptable ranges. We run three Markov chains of 10 000 iterations with a five-iteration thinning (i.e. we only consider one random sample out of five) to calibrate the parameters. During the second step, we fix the 12 carbon allocation parameters  Menne et al. (2012aMenne et al. ( , b) 1909Menne et al. ( -1944Menne et al. ( or 1910Menne et al. ( -19491950-2000  at the values obtained from the first step. We calibrate the 6 photosynthesis parameters by also running three Markov chains of 10 000 iterations with a five-iteration thinning. Finally, during the third step, the 6 photosynthesis parameters are fixed at the values obtained from the second step, and the 12 carbon allocation parameters are calibrated, by running three Markov chains of 30 000 iterations, with a fiveiteration thinning as well. Each of those nine chains starts from random initial values of the parameters in their acceptable ranges. At the end of each calibration step, we select the set of parameters with the highest posterior (maximum a pos-teriori value or MAP, Hartig et al., 2019) from all iterations considering a burn-in period (i.e. the number of initial iterations of a chain that are not considered in the calibration) of 1000 iterations (first and second steps) and 3000 iterations (third step). At the end of the calibration process, we thus have 6 calibrated parameters from the second calibration step and 12 carbon allocation parameters from the third one. The calibration method has been tested for convergence of Markov chains with Gelman-Rubin statistical indicators (Hartig et al., 2019). The MAIDEN model was calibrated at the 21 Eastern Canadian taiga sites and at the 5 aggregated sites over the 1950-2000 time period using the high-(NRCAN), mid-(GMF) and low-resolution (20CRv2c) datasets as inputs, as well as the bias-corrected low-resolution dataset (20CRv2c corr.), and over the 1900-2000 time period using the 20CRv2c and 20CRv2c corr. datasets as climatic inputs. MAIDEN was also calibrated at the three European sites using GHCN station data over 1950-2000EALP;SWIT179), 1909-1944(FINL045) and 1910-1949. Calibrated parameters values for the 1950-2000 time period are available in Tables S4-S7. Parameter posterior frequency distributions for the NRCAN (5 arcmin) high-resolution climatic dataset are available in Figs. S1-S58. Pearson correlation coefficients between observed TRW and simulated Dstem were computed, as well as the corresponding confidence level. To compare observed and simulated tree-ring growth data after the optimization of the model parameters, both observed tree-ring width series and simulated time series have been normalized to unitless indexes. Ideally, an exhaustive quantitative evaluation of MAIDEN would require a comparison of the variable simulated by MAIDEN to represent tree-growth directly with observations. However, this would imply the use of other tree-growth observations such as tree-ring density measurements, while tree-ring width represents the most widely available tree-growth observations, which makes it a relevant candidate given our global-scale goals. The disadvantage is that this normalization forbids us to assess error in the variance. This is why we only analyse the correlations for simplicity as using other metrics like the RMSE would not help us in this aspect.

The VS-Lite model
The VS-Lite parameters are calibrated at each location following a Bayesian approach described in Tolwinski-Ward et al. (2013). In this study, four VS-Lite parameters, corresponding to the lower and upper temperature (respectively  Tables S8-S11. Pearson correlation coefficients between TRW observations and simulated tree-growth indexes were also computed. Observed time series have been normalized to unitless indexes as well. Running MAIDEN takes around 2.5 s on one CPU for a 50-year time span, while running VS-Lite takes around 0.30 s. Currently, calibrating MAIDEN with our method takes around 18 h on one CPU for a site due to the high number of iterations and calibrated parameters, while the calibration method used for VS-Lite and developed by Tolwinski-Ward et al. (2013) takes only a few seconds.

Validation
Split-sample validation are performed by dividing the available data into two subperiods: one for calibration and one for validation, and vice versa. In order to test the influence of time series length, we validate the two models for both short (1950-1974 and 1975-2000) and long (1909-1944 and 1950-2000 or 1910-1949 and 1950-2000) time periods. For each validation experiment, Pearson correlation coefficients between observed TRW and simulated tree-growth indexes were computed, as well as the corresponding confidence level.
Split-sample validation was preferred over other validation methods such as h-block jackknife, which are computationally intensive. Additionally, removing years may be inappropriate for the validation because of the autocorrelation characterizing yearly TRW observations. Similar problems arise from a bootstrap technique .

Results and discussion
Our results and discussion are structured into five sections that allow us to fulfil our objective of testing the applicability of MAIDEN over the 20th century (Table 3). At first, we want to determine the best set of parameters for MAIDEN at our study sites and test the sensitivity of calibration to the quality of climatic inputs (Sect. 3.1, 3.2 and 3.3). In a context of palaeoclimate model-data comparison, where MAIDEN will be driven by climate model outputs at low resolution, this is a crucial point of our analysis. For example, bias correction and downscaling techniques could be good options to improve the robustness of the model calibration if the model is sensitive to the quality of climatic inputs.
We first test the possibility of using calibrated parameters from a well-documented site at other similar sites in terms of environment (here the Eastern Canadian taiga) and tree species (here Picea mariana) in Sect. 3.1. Another option is to calibrate each site individually, as in Sect. 3.2 following the calibration protocol detailed in Sect. 2.3.1. We thirdly test in Sect. 3.3 an alternative calibration method which consists of calibrating the MAIDEN model over the mean of a treering width observations network with similar environmental conditions and then applying the resulting calibrated parameters to the individual sites. From another perspective, this experiment could also be seen as an alternative method for the validation of the MAIDEN model when the climate and/or tree-ring width observation time series are too short for a split-sample validation. In this case, the individual sites are considered nearly independent validation data. To test the sensitivity of the model to the quality of climatic inputs, we have selected four climatic datasets at different spatial resolutions (Sect. 2.2.2, Table 2) that will be used in Sect. 3.2 to drive MAIDEN at the Eastern Canadian taiga sites. As a second sensitivity experiment, we have applied the parameters calibrated with MAIDEN using the high-resolution climatic data (NRCAN) to the Eastern Canadian taiga sites driven by the low-resolution data without or with bias correction (20CRv2c and 20CRv2c corr.).
The validation of MAIDEN in Sect. 3.4 is essential to evaluate the robustness of the calibration. The last section of our study consists of comparing the performance of the complex model MAIDEN with the performance of the simple model VS-Lite so as to assess the benefits of using a complex treegrowth model as MAIDEN for past climate reconstruction compared to a simple one (Sect. 3.5).

Application of prior MAIDEN parameters to all Canadian sites
At first, the QC_taiga parameters as calibrated by   1950-1974/1975-20001909-1944or 1910-1949/1950-2000 We validate our calibration procedure for MAIDEN using a split-sample method.
ble 2) over the 1950-2000 time period. Correlations between observations and simulations with MAIDEN using QC_taiga calibrated parameters (Fig. 3) are low and non-significant at most sites. Several reasons can explain the low skill of MAIDEN using those parameters. These results could be linked to the lower replication level at the sites from Nicault et al. (2014) and Boucher et al. (2017) -even when aggregated -compared to the site from  that weakens the climatic signal in the series. This may also be due to a high sensitivity of parameter calibration to an unstable climate-species relationship among sites that are different from each other in many aspects (such as soil type, vegetation, nutrient availability). Additionally, the long-term trends of forest growth in the Eastern Canadian taiga mostly depend on the past fire history (e.g. Payette et al., 2008;Gennaretti et al., 2014a;Erni et al., 2017). This represents the main natural disturbance factor that has shaped the North American boreal ecosystem by determining forest structure and composition as well as carbon stocks and interacting with climate on a long timescale. Yet, MAIDEN does not account for disturbances. To evaluate the effect of those disturbances on our experiment, the long-term decadal trends have been removed in both observations and simulations following   (Fig. S59). With only the highfrequency signal, the agreement between TRW observations and simulations with MAIDEN using QC_taiga calibrated parameters is far better for most individual and aggregated sites.

Site-specific calibration of the MAIDEN parameters and sensitivity to the quality of climatic inputs
A second option is to calibrate each of the 21 Eastern Canadian taiga sites as well as the 5 aggregated Eastern Canadian taiga sites (Fig. 1) using the calibration procedure detailed in Sect. 2.3.1. Correlations between tree-growth observations and simulations with MAIDEN for the 1950-2000 calibration period at the Eastern Canadian taiga sites are good and significant for all the climatic datasets (Fig. 4a). Correlations are in general slightly higher for the higher-resolution datasets (NRCAN (5 arcmin) and GMF (1 • ) datasets, with an average correlation of 0.62 and 0.65, respectively, compared with 0.57 for 20CRv2c (2 • ) and 0.61 for 20CRv2c corr. (2 • )). At the aggregated sites (Fig. 5a), correlations for each dataset increase a little bit compared to the average of individual correlations, but the general picture is the same. The bias correction (20CRv2c corr. (2 • )) can slightly improve correlations for the 20CRv2c (2 • ) climatic dataset in some cases (e.g. WL42 and WROZM). Consequently, those results do not indicate that using higher-resolution datasets effectively increase correlations. This is likely due to the calibration procedure that might be able to compensate for specific biases in each climatic dataset. This implies large variations of cali- Many potential biases of tree-ring observations due to the specific physiology of selected trees -that may not be representative of forest processes -and the chronology building process exist that may dampen the comparison with what MAIDEN simulates, i.e. forest carbon accumulation and not forest demographic processes (Johnson and Abrams, 2009;Duchesne et al., 2019). Ideally, considering those biases, we should find a better way to transform tree-ring data in time series with meaningful units to improve model-data comparisons. For example, Gennaretti et al. (2018) compute a wood biomass production index, which is closer to what MAIDEN simulates. This implies that we have access to both tree-ring width and density measurements.
Pearson correlation coefficients between TRW observations and tree-growth index simulations by MAIDEN for the 1900-2000 calibration period (Fig. 4b) are in most cases lower than those of the 1950-2000 calibration period. The bias correction can slightly improve correlations in some cases, but the latter stay smaller. At the aggregated sites (Fig. 5b), correlations for each dataset decrease slightly compared to the mean of individual correlations. The low correlation for the whole 20th century can be explained by the large uncertainty of the 20CRv2c (2 • ) climatic dataset before 1950 there, as measured by the large spread of the 20CRv2c ensemble spread at that time (Fig. S62).
When applying the parameters calibrated using the highest resolution dataset, NRCAN (5 arcmin), as climatic inputs to the Eastern Canadian taiga sites driven by the 20CRv2c (2 • ) dataset (Fig. 6b, in red), correlations are on average much lower. Mean correlation is 0.17 in that case compared to 0.57 when the parameters are calibrated using 20CRv2c (2 • ) as climatic inputs. With the 20CRv2c corr. (2 • ) dataset as climatic inputs -i.e. the low-resolution dataset corrected for bias in the mean seasonal cycle - (Fig. 6a, in red) we see that the performance of the MAIDEN model when applying NRCAN (5 arcmin) parameters is less good compared to the case when the parameters are calibrated using 20CRv2c corr.
(2 • ) as climatic inputs (in black). Nevertheless, correlations are far better than with 20CRv2c (2 • ) (Fig. 6b, in red). Indeed, the mean correlation is 0.36 when applying NRCAN (5 arcmin) parameters and 0.61 when applying 20CRv2c corr. (2 • ) parameters. Consequently, the bias correction of the 20CRv2c (2 • ) increases the robustness of the calibration of the MAIDEN parameters. Additionally, this shows that the MAIDEN model parameter calibration is highly sensitive to the quality of the climatic dataset used as inputs.
At the aggregated sites (Fig. 7), the general picture is the same but with far lower correlations. The mean correlations are 0.07 when applying the parameters calibrated using NR-CAN (5 arcmin) to the aggregated sites driven by 20CRv2c (2 • ) dataset and 0.56 when the parameters are calibrated using 20CRv2c (2 • ). With the 20CRv2c corr. (2 • ) dataset as climatic inputs, mean correlations are respectively 0.18 and 0.61 with NRCAN (5 arcmin) and 20CRv2c corr. (2 • ) parameters. Those results would require a case-by-case analysis as it seems that higher replication does not provide better performance in this specific experiment.

Regional calibration of MAIDEN
At last, we apply the parameters calibrated against the mean of TRW observations from the 20 Eastern Canadian taiga  (Fig. 1a) with MAIDEN using the different climatic datasets described in Table 2 (Fig. 1b), green circles), and mean and range of correlations (individual Eastern Canadian taiga sites used in aggregation ( Fig. 1a and b), in black) between tree-growth observations and simulations with MAIDEN using the different climatic datasets described in Table 2 as inputs for the 1950-2000 (a) and 1900-2000 (b) calibration periods. sites (Fig. 8) to the five aggregated sites (Fig. 8b) and to the individual sites used in the aggregation procedure (Fig. 8a). For this experiment, we use the NRCAN (5 arcmin) climate data (Sect. 2.2.2, Table 2) averaged over individual sites for each aggregated site (Table 1). The main parameters linked to site conditions and control parameters (Table S1) are fixed to their mode (soil parameters), mean (site latitude, elevation and isohyet, station elevation and isohyet) or common value (exp_site, slope and aspect parameters). Overall, correlations between TRW observations and simulations by MAIDEN with parameters calibrated based on the mean of the observed TRW time series are low and non-significant for the individual sites (Fig. 8a). At the more replicated aggregated sites (Fig. 8b), correlations between TRW observations and simulations get better with three significant correlations out of five sites. However, this result should be viewed in parallel with the individual correlations (Fig. 8a) and sites implied in the aggregation (Table 1). Indeed, aggregated sites   (Table 2). White inner circles stand for non-significant correlations (p value > 0.05).
with higher correlations are made up of individual sites with higher correlations as well. It means that probably not only higher replication is at the origin of higher correlations for most aggregated sites but also the specific conditions at each individual site, as well as site ecological history, as previously mentioned (Sect. 3.1).

Split-sample validation of MAIDEN calibration
Depending on the available years, we have selected different time periods at the European sites (Table 4) and at the aggregated Eastern Canadian taiga sites (Table 5), using each period once for the calibration and once for the validation. At the European sites, 25 years is clearly a period of time that is too short to get robust results, while the validation is generally successful for the longer period as indicated by the significant correlations -except in one case - (Table 4). Similarly, at the aggregated Eastern Canadian sites -where we only have 50 years of reliable climate data (see Sect. 3.2) -a 25-year subperiod is not enough for a robust calibration and validation (Table 5). However, even on the long time period (Table 4), we can see a clue of some overfitting, especially at the SWIT179 site, where the correlation for the validation period is far lower compared to the correlation for the calibration period. Those results show that because of the large number of parameters, the validation of MAIDEN is difficult. It requires long observation series, but the skill of the model still decreases significantly for the validation period.

Comparison with VS-Lite
On average, over the 1950-2000 calibration period at the individual Eastern Canadian taiga sites, VS-Lite has lower correlations for the highest-resolution dataset (NRCAN) compared with MAIDEN, i.e. 0.106 and 0.62 mean correlations for VS-Lite and MAIDEN, respectively (Fig. 9). Results for  (Fig. 1a and b) with MAIDEN using the NRCAN (5 arcmin) climatic dataset (Table 2) with site-specific calibration of the parameters (Orig. calib., in red) and with parameters calibrated based on the mean of the observed TRW time series (Mean calib.) for the 1950-2000 period. White inner circles stand for non-significant correlations (p value > 0.05). Table 4. Pearson correlation coefficients between tree-growth observations and simulations at the European sites ( Fig. 2) with MAIDEN and VS-Lite using GHCN as climatic inputs (Table 2) for the 1950-1974and 1975-2000and for the 1910-1949(EALP, SWIT179) or 1909-1944(FINL045) and 1950-2000 143 1910-1949 or 1909-1944 1950-2000 (Fig. S63). As for split-sample vali-dation over the long time period, the performance of VS-Lite is more stable (less fall of validation from calibration correlation) compared with MAIDEN (Table 4) even if correlations are, except for SWIT179, lower than MAIDEN. Similarly, over the short time period, the performance of VS-Lite is less Table 5. Pearson correlation coefficients between tree-growth observations and simulations at the aggregated Eastern Canadian sites ( Fig. 1b) with MAIDEN using NRCAN (5 arcmin) as climatic inputs (Table 2) for the respectively 1950-1974 and 1975-2000 calibration and validation periods and vice versa. 1950-1974 1975-2000

Figure 9.
Pearson correlation coefficients between tree-growth observations and simulations at the Eastern Canadian taiga sites ( Fig. 1a) with VS-Lite (in red) and MAIDEN (in black) using NR-CAN (5 arcmin) as climatic inputs (Table 2) for the 1950-2000 calibration period. White inner circles stand for non-significant correlations (p value > 0.05).
good than over the long time period but still more stable than MAIDEN (Table 4). Compared to VS-Lite, MAIDEN has shown lower skill over short-time-period validation, which indicates that we should only use MAIDEN when a long enough period is available for validation. As for a long validation period, MAIDEN has shown a stronger decrease in correlations compared to VS-Lite but still with higher correlations than VS-lite on average. This would indicate that the MAIDEN calibration is not always prone to overfitting. As our objective is to provide a first test of our calibration methodology using only a few sets of tree-ring sites, the obtained results only give an incomplete view of the MAIDEN model performance and its comparison with VS-Lite, focussing over a limited range of climate regimes. More experiments in different conditions are required in the future to exhaustively evaluate and compare the performance of both models.

Conclusions
In this paper we have tested the applicability of the ecophysiological tree-growth model MAIDEN for potential dendroclimatological applications during the 20th century at 21 Eastern Canadian taiga sites and 3 European sites using treering width observations. Our results provide a protocol for the application of MAIDEN to potentially any site with treering width data in the extratropical region, i.e. from climatic data selection to validation, through automatized Bayesian calibration of the most sensitive parameters. As the ultimate goal is to use MAIDEN in a context of palaeoclimatic reconstruction, forced by low-resolution climate models outputs, we also analysed the sensitivity of the model to parameter calibration and to the quality of climatic inputs. The performance of MAIDEN was compared to the one of a simple tree-growth model, VS-Lite, to evaluate the advantages of using a complex tree-growth model for past climate reconstruction.
Different strategies have been tested to select the value for the most sensitive parameters of the MAIDEN model. When applying calibrated parameters from a well-documented site at other sites with the same species and similar environmental conditions, very low correlations between tree-ring width observations and simulations by the MAIDEN model are found. However, when removing the long-term trend to account for the past disturbance history of these sites that is not represented in MAIDEN, correlations get higher. In the future, this strategy can be used by selecting sites carefully to avoid disturbances. At our study sites, the Bayesian calibration of the most sensitive parameters of the model can provide good and significant correlations between tree-growth observations and simulations.
Secondly, sensitivity of the MAIDEN model parameter calibration to the quality of the climatic data used as inputs has been highlighted. In a context of palaeoclimatic applications, where MAIDEN will be driven by climate model out-puts at low resolution, bias-correction and downscaling techniques could be good options to improve climate inputs and calibration quality, thereby leading to reasonable correlations with observed tree-ring width.
Our split-sample validation experiments are encouraging. However, when a calibration interval of only a few decades is available, the calibration displays large overfitting for individual sites as indicated by the very low correlation with observations over the validation period. Similar split-sample experiments on longer series show much better results, with potentially some overfitting but still with relatively high and generally significant correlations over the validation period. When working with a network of similar sites, the alternative validation technique, i.e. applying calibrated parameters from the mean of a network of tree-ring width observation series with the same species and environmental conditions to the individual sites, should be preferred if not enough data (climate and TRW observations) are available for splitsample validation.
Lastly, at our study sites, MAIDEN has shown higher calibration and validation correlations in most cases compared to VS-Lite. VS-Lite correlations over the calibration period are especially far lower for sites with low replication (i.e. the Eastern Canadian taiga sites from Nicault et al., 2014, andBoucher et al., 2017). However, VS-Lite stays more stable over both calibration and validation periods. Consequently, VS-Lite has a lower ability to reproduce tree growth at our sites but is less prone to overfitting than MAIDEN. Most importantly, we have shown that to limit overfitting, MAIDEN should not be used with short and low-replicated tree-ring width observation time series. VS-Lite is less risky to use in such situations as there is potentially less overfitting in the calibration and probably easier to apply over a large network of tree-ring width time series. However, VS-Lite does not include CO 2 nor biological processes and may thus not be able to take into account changes in conditions between the recent calibration period and the more distant past.
In the future, MAIDEN will be applied at a larger spatial scale in a systematic way, using the protocol that has been developed here, by selecting hundreds of sites from the commonly used databases in palaeoclimate reconstruction based on tree-ring proxies, covering a wide range of environmental conditions and tree species, such as PAGES 2k (PAGES 2k Consortium, 2017) and NTREND (Wilson et al., 2016;Anchukaitis et al., 2017). This broader analysis will allow us to refine the protocol developed here in order to identify the sites where MAIDEN can be successfully applied and estimate the uncertainty associated with the use of MAIDEN for many more different sites.
Although some limitations could remain in our calibration protocol, we have shown the ability of MAIDEN to simulate tree-growth index time series that can fit robustly tree-ring width observations under certain conditions (well-replicated tree-ring width observation time series, high-resolution or downscaled climate data, long time period), as well as its po-tential to be used as a complex mechanistic proxy system model in palaeoclimatic applications and more specifically in data assimilation. Data availability. The structure of MAIDEN (Misson, 2004;Gea-Izquierdo et al., 2015; is provided online (https://figshare.com/articles/MAIDEN_ecophysiological_ forest_model/5446435/1, last access: 17 November 2019;  and its modules are available upon request. The VS-Lite model code is available at the National Oceanic and Atmospheric Administration's Paleoclimatology World Data Center (https://www.ncdc.noaa.gov/paleo-search/reports/ all?dataTypeId=59&search=true, last access: 3 October 2018). The Eastern Canadian taiga tree-ring width data from Nicault et al. (2014) and Boucher et al. (2017) can be downloaded from http://dendro-qc-lab.ca/trw.html (last access: 30 March 2019). The European chronologies are also available online: the EALP tree-ring width data (Büntgen et al., 2011) can be accessed through the PAGES 2k database (https://doi.org/10.1038/sdata.2017.88, PAGES 2k Consortium, 2017); unprocessed SWIT179 tree-ring width data are archived at the International Tree Ring Data Bank (https://www.ncdc.noaa.gov/data-access/paleoclimatology-data, last access: 12 January 2019); the FINL045 tree-ring width data are available in the supplementary materials of Babst et al. (2013). The Global Meteorological Forcing Dataset for land surface modeling (v1) (Sheffield et al., 2006) can be downloaded from http://hydrology.princeton.edu/data.php (last access: 4 January 2019). The NOAA-CIRES 20th Century Reanalysis V2c can be downloaded from https: //psl.noaa.gov/data/gridded/data.20thC_ReanV2c.monolevel.html (last access: 4 January 2019). The gridded interpolated Canadian database of daily minimum-maximum temperature and precipitation (Hutchinson et al., 2009) are available upon request. The Global Historical Climate Network daily station data (Menne et al., 2012a, b) are available online (https://www.ncdc.noaa.gov/ghcnd-data-access, last access: 10 January 2019). Annual atmospheric CO 2 concentration data from Sato and Schmidt can be downloaded from https://data.giss.nasa.gov/modelforce/ghgases/ (last access: 3 December 2018). All results from this paper are available upon request.
Author contributions. This study is part of JR's thesis under the supervision of HG and JG. JR, HG and JG designed the study. JR performed the analysis, wrote the paper and made the figures. FG provided scripts and expertise that have made the work with MAIDEN possible. EB and FG provided the Eastern Canadian taiga tree-ring width data. HG and JG supervised the statistical analyses. FG and EB provided advice and expertise on the preparation and interpretation of the Eastern Canadian taiga sites experiments. FA and MJ contributed to the calibration of the MAIDEN model by sharing their codes and expertise. All authors have contributed to the improvement of the paper.