Estimating 750 years of temperature variations and uncertainties in the Pyrenees by tree-ring reconstructions and climate simulations

. Past temperature variations are usually inferred from proxy data or estimated using general circulation models. Comparisons between climate estimations derived from proxy records and from model simulations help to better un-derstand mechanisms driving climate variations, and also offer the possibility to identify deﬁciencies in both approaches. This paper presents regional temperature reconstructions based on tree-ring maximum density series in the Pyrenees, and compares them with the output of global


Introduction
Estimations of future climate change indicate that variations at regional scales may be larger than the global average (IPCC, 2007). However, these regional projections are hampered by the limited knowledge of the mechanisms that give rise to variability at multi-decadal and longer time scales. One reason is the short length of the available instrumental records. The relative abundance of proxy records during the last millennium (e.g. Jones et al., 2009) offers the possibility to reconstruct climate parameters such as temperature and precipitation, improving our understanding of past regional climate variability at interdecadal and longer time scales (Jansen et al., 2007). Currently, two main approaches are being used to advance this line of research: (1) creating new or extending the existing climate reconstructions back in time using documentary evidence or environmental proxies such as tree-ring parameters (width, density, isotopic content), corals, speleothems (Jones et al., 2009); and (2) simulations of natural and forced climate variability using climate models (e.g. Zorita et al., 2005;González-Rouco et al., 2009;Junglcaus et al., 2010).
Tree-rings may contain useful information on past environmental conditions, which can be used to reconstruct past climate using statistical relationships to a target climate variable (Fritts, 1976;Cook and Kairiukstis, 1990). In this respect, tree-ring width and density are the most widely used proxies for the last millennium, and their links to seasonal temperatures have been extensively studied (Briffa et al., 2002(Briffa et al., , 2004Büntgen et al., 2006Büntgen et al., , 2007Büntgen et al., , 2008Esper et al., 2002Esper et al., , 2005aGrudd, 2008). Although tree-ring measurements are accurately annually dated, their ability to encode centennial to multi-centennial climate variability is often limited by the standardization technique used to remove agerelated trends. The climate signal at these time scales may be distorted by the technique used to remove non-climatic variations in tree-ring series (Cook et al., 1995;Esper et al., 2002Esper et al., , 2003Briffa and Melvin, 2008). In addition to the standardization issue, it is desirable that the climate reconstructions based on proxies represent a climate signal that is not locally confined. This is necessary in order to bridge the gap in the spatial scale between the coarse resolution of climate models and the local proxy-based climate reconstructions. Thus, individual or site chronologies are ideally aggregated into one regional chronology that captures a regional climate signal, the same spatial resolution that climate models are intended to represent. Furthermore, the technique applied to transform tree-ring parameters to meteorological units also implies a challenge: whereas simple linear regression methods may underestimate low-frequency variability, variance matching, in the following referred to as scaling, shows better performance in retaining large-scale variations but at the expense of inflated error estimates (Esper et al., 2005b;Zorita et al., 2010;McCarrol et al., 2011).
A variety of climate models of increasing complexity is available for palaeoclimate simulation experiments: from energy balance models (EBM; e.g. Hegerl et al., 2006) to threedimensional coupled ocean-atmosphere general circulation models (GCMs; e.g. Zorita et al., 2005). GCMs have a global coverage, although their spatial resolution is still limited by computing power requirements. Additionally, the output of climate models is burdened by an important amount of uncertainty arising from limitations in the models themselves, as well as from the prescribed external forcings used to drive the simulations, which are mostly estimated from proxy data (Jansen et al., 2007). Thus, the validation of the models is necessary prior to using their simulated outputs. This validation is based on the assessment of their skill to reproduce the general circulation and main climatic features of the current observed climate, although this does not guarantee their reliability to project future or past climate changes.
The model ECHO-G was part of the Climate Model Intercomparison Project CMIP3, which was widely used in the Fourth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC). The ECHO-G model was identified as one of the best performing models in the suite when benchmarked against recent temperature, precipitation and sea level pressure data (Gleckler et al., 2008). Alternatively, regional climate models (RCM) simulate a limited area, thus reducing the computational requirements and allowing simulations at higher spatial resolutions. Simulations with this kind of model over the last millennium have recently become available for specific regions (e.g. Gómez-Navarro et al., 2011. The development of RCM palaeoclimate simulations depends on the existence of GCM runs where boundary conditions are taken from. However, the higher RCM resolution improves several aspects of the climate simulation with coarser GCMs, such as a better representation of the interactions between the large-scale circulation and the orography, which plays an important role in regional climates. Additionally, the RCM employed here implements a set of physical parameterizations of the sub-grid processes that has been demonstrated to reproduce realistically the heterogeneous climate over the Iberian Peninsula (Gómez-Navarro et al., 2011). Thus, these model data allow in general for more realistic model-data comparisons.
Validations based on cross-comparisons between reconstructions and model estimates are still needed, especially at regional scales, since such comparisons offer the possibility to identify deficiencies in both approaches and thus to improve proxy-based reconstructions, climate models and forcing estimates (e.g. D' Arrigo et al., 1999;Collins et al., 2002;Tan et al., 2009;González-Rouco et al., 2009;Zorita et al., 2010;Gómez-Navarro et al., 2011).
In the climate context in the Iberian Peninsula, the Pyrenees range constitutes an interesting area for palaeoclimate reconstruction and simulations due to its location in the transition zones between the Eurosiberian and Mediterranean ecoclimatic regions, where future climate change is expected to have large effects (IPCC, 2007). Furthermore, networks of tree-ring data suitable for developing temperature reconstructions  as well as global (González-Rouco et al., 2009) and regional (Gómez-Navarro et al., 2011 scale climate simulations are available for this region. By using both proxy and model data, this work aims at two main objectives: (1) to generate an improved version of the tree-ring based reconstruction of past temperatures in the Pyrenees by using a larger dataset and evaluating the uncertainties related to the usage of several methodological variants; (2) to assess the realism of GCM and RCM simulations and their differences in reproducing available evidence of past climate variability over the Pyrenees area by comparing them to the tree-ring based climate reconstructions developed here.
With this purpose, in the first part of the study, we test different methods to develop the tree-ring based climate reconstructions, considering the most usual approaches for the standardization of the individual series, for the aggregation into a regional chronology and for the calibrationreconstruction technique applied. The resulting reconstruction variants are compared to a previously existing maximum temperature reconstruction based on tree-rings developed by Büntgen et al. (2008). In the second part, we compare the climate variability of the reconstruction based on the tree-ring records with that simulated by a GCM and a RCM over the Pyrenees area. We focus on identifying and discussing the possible causes for the different estimations of past climate changes.

Material and methods
A dataset of 804 individual maximum density (MXD) treering series, grouped into 22 chronologies covering the Pyrenees area, was available for the present study. The network includes the two chronologies previously used to develop the first tree-ring based regional reconstruction of past maximum temperatures in the Pyrenees (Büntgen et al., 2008): additional data published in Büntgen et al. (2010), and a newly developed 700-yr long MXD chronology. Compared to the previous reconstruction of maximum temperature, the new dataset covers a wider area and uses a larger replication, not only in most recent centuries, but also in the earliest part of the period (1260-1350 AD).
The location, elevation and tree species at each site are indicated in Fig. 1. The species sampled were Mountain pine (Pinus uncinata Ramond ex DC. in Lam. et DC.), a shadeintolerant and long-lived conifer that grows under temperate subalpine climate conditions and dominates at elevations above 1800 m a.s.l., and Silver fir (Abies alba Mill.) that usually grows at humid sites on north facing, shady slopes and at elevations above 1200 m a.s.l. (Macias et al., 2006). Büntgen et al. (2008Büntgen et al. ( , 2010, to obtain the MXD measurements from tree-rings, all cores were sanded and visually cross-dated following dendrochronological procedures described by Stokes and Smiley (1968). The accuracies of the visual cross-dating and measurements were verified using the computer program COFECHA (Holmes, 1983). For the density measurements, lathes perpendicular to the wood fibres were extracted from 804 cores and analysed following X-ray microdensitometric techniques developed by Polge (1965). The output is a density profile of points taken at 0.01 mm intervals in the radial direction across each ring. The MXD along this direction was estimated for each year.

Site chronologies
In order to construct the MXD chronologies for each of the 22 sites, density series were processed according to standard dendrochronological techniques (Cook and Kairiukstis, 1990). The individual series were standardized to remove age trends in the mean by fitting a stiff function, which not only emphasizes inter-annual variations, but also preserves multi-decadal scale variations in the final chronology (Cook et al., 1995). To assess the uncertainty caused by the application of different standardizations, the individual series underwent different standardization procedures with ARSTAN (Cook and Holmes, 1986). Two detrending techniques were used to remove age trends: regional curve standardization (RCS; Esper et al., 2003) and a 300-yr cubic smoothing spline with a 50 % frequency response cut-off (300sp; e.g. Büntgen et al., 2010). In addition to these two detrending techniques, a power transformation (PT) for the stabilization of the variance in the individual series was alternately conducted (Cook and Peters, 1997). Although raw MXD series are often assumed to be homoscedastic, a conservative position was adopted, since the existence of long-term trends in the serial variance cannot be entirely rejected. Thus, a set of four different standardization procedures was applied considering the possible existence of trends in the variance: (1) RCS; (2) RCS preceded by PT (RCSPT); (3) 300sp; and (4) 300sp preceded by PT (300spPT). Finally, for each site, chronologies were constructed by using a bi-weight robust mean (Cook and Kairiukstis, 1990), which reduces the possible bias caused by extreme values. The final length of the chronologies was established by truncating the final chronologies when less than five contributing samples were available (Esper et al., 2003;Büntgen et al., 2008), leading to a maximum reliable time span of 1260-2005 AD (Fig. 2).

Regional Pyrenees chronologies
MXD individual series and site chronologies were standardized and aggregated into regional chronologies using three different approaches: (1) computing a simple arithmetic average of the 22 individual MXD chronologies (RCS mean, RCSPT mean, 300sp mean, 300spPT mean); (2) computing a single chronology including all 804 individual series in one run (RCS chrono, RCSPT chrono, 300sp chrono, 300spPT chrono); and (3) conducting principal component analyses (Peters et al., 1981), computed separately for each time segment with a constant number of MXD chronologies to produce nested PCA regional chronologies (PCs RCS; PCs RCSPT; PCs 300sp; PCs 300spPT). The combination of the four different procedures for standardization and the three different procedures for aggregation yields a total of 12 different regional chronologies for the Pyrenees region.
Due to the different time spans covered by the 22 chronologies (Fig. 2), a total number of 25 PCA runs were performed for each of the four standardization methods (RCS, RCSPT, 300sp and 300spPT) in order to obtain the corresponding scores of the principal components (PCs) for each of the time segments. In addition, the spatial loadings, sometimes denoted as empirical orthogonal functions (EOFs), were also analyzed to assess the similarities and differences between their spatial patterns among the four standardization methods. For the EOF maps we use the loading that corresponds to the run5 (time span 1928-1977 AD), which includes all 22 MXD chronologies.
Simple correlations were used to assess the climate information contained in the regional chronologies. In mountainous regions such as the Pyrenees, temperature series have shown stronger inter-site relationships between distant sites than precipitation data, which display stronger variability at local spatial scales (Agusti-Panareda et al., 2000). Temperature is known to be the dominant climate factor controlling tree-growth at high-elevation sites in the Pyrenees (Macias et al., 2006;Büntgen et al., 2008Büntgen et al., , 2010. A regional record consisting of the data from six stations: Pamplona, San Sebastián, Zaragoza, Barcelona, Burgos and Huesca (Brunet et al., 2007(Brunet et al., , 2008 was available as a representative homogenized and high-quality temperature record of the area for the last 100 yr. All regional chronologies, together with the leading principal component PC1 of the PCA, were correlated with instrumental regional maximum, minimum and mean temperature from Brunet et al. (2007Brunet et al. ( , 2008 at monthly and seasonal time scales (results not shown). Mayto-September mean temperature (T MJJAS ) was identified as the most influencing parameter for tree growth.

Pyrenees reconstructions
Pyrenees temperature reconstructions were obtained using both scaling and regression methods (Esper et al., 2005b) applied to the different regional chronologies developed. Thus, from the collection of 12 regional MXD chronologies built using the three different methodological approaches detailed above, 24 regional reconstructions were developed: 12 using scaling and 12 using regression techniques (Table 1).
Each chronology and the instrumental records were high and low-pass filtered with a 20-yr centred moving average and then correlated to assess the similarity at high and low frequencies, respectively. Calibration/verification tests against the overlapping T MJJAS regional instrumental temperature record were performed in split periods (Snee, 1977). The tree-ring model was calibrated against the first half of the instrumental temperature record (1900( -1952 and validated against the second half (1953, and viceversa. Quality control statistics such as the coefficient of determination (r 2 ), Pearson's correlation coefficient between observed and predicted values (r obs-pred ) and the reduction of error (RE; Cook et al., 1994) were calculated to test the validity of the reconstructions, derived using both methodologies. Usually, RE values above zero indicate reconstruction skills, but since RE has no lower boundary and no test for its statistical significance exists, it should be interpreted carefully. Reconstruction uncertainties related to the differences of the precursor chronologies and the calibration models used for reconstruction were assessed.
Additionally, the maximum temperature reconstruction for the Pyrenees of Büntgen et al. (2008) (PYR) was modified to be fully comparable to the newly developed set of regional mean temperature reconstructions. The original chronology used in Büntgen et al. (2008) was calibrated against May-September mean temperatures to build a reconstruction called "mPYR".
The resulting set of temperature reconstructions was compared with climate simulations performed with global and regional climate models, spanning the last millennium in the target region.

Model simulations
The global GCM simulations used here were produced with the ECHO-G (Legutke and Voss, 1999) climate model, consisting of the ECHAM4 atmospheric component and the HOPE-G ocean model. ECHAM4 is used with a T30 horizontal resolution (ca. 3.75 • lat × long) and HOPE-G with a horizontal resolution of approx. 2.8 • lat × long, with a finer resolution in the tropics increasing towards the Equator where it reaches a minimum meridional grid point separation of 0.5 • . The model is run using flux adjustments to avoid climate drift. This work makes use of two existing model simulations (Erik1 and Erik2) produced by driving the model with estimations of the evolution of some natural and anthropogenic forcings during the last millennium. Both simulations were produced starting from different initial conditions and driven by identical forcings from year 1000 AD onwards. Erik1 (Erik2) started from a relatively warm (cold) state and was integrated for 100 yr according to the forcing levels of year 1000 AD, with the forcing values then changed yearly according to the prescribed estimates. Erik1 presents warmer values in the first centuries of the simulation. Goosse et al. (2005) and Osborn et al. (2006) suggest that this simulation is unusually warm in comparison with other model simulations due to the warmer initial state. Erik2 shows similar amplitudes in the temperature changes through the last millennium in comparison to other model runs (e.g. Zorita et al., 2007).
The external forcings used as boundary conditions for the simulations include estimations of changes in solar irradiance and the parameterized effect of stratospheric volcanic aerosols, both adapted from Crowley (2000), as well as the concentrations of greenhouse gases (CO 2 , CH 4 and N 2 O) provided by Etheridge et al. (1996Etheridge et al. ( , 1998 and Battle et al. (1996). For a more in-depth description of the simulations and the external forcings applied as well as model details, the reader should refer to Zorita et al. (2005) and González-Rouco et al. (2006.
The RCM simulations used herein were produced with a climate version of the model MM5, developed by NCAR and the Pennsylvania State University (Dudhia, 1993;Grell et al., 1994). The model domain is double-nested with resolutions of 90 km and 30 km, respectively. The high-resolution domain covers the area of the Iberian Peninsula. Two 1000yr-long simulations were run. Each simulation (MM5-Erik1 and MM5-Erik2) was produced by driving the MM5 model with boundary conditions obtained from the ECHO-G Erik1 and Erik2 runs, respectively. The boundary conditions are introduced into the outer domain of the MM5 model through a blending area of five grid points at the boundaries. Specifications for the model, simulation experiments and the physics configuration are also provided in Gómez-Navarro et al. (2011.

Regional Pyrenees MXD chronologies and climate signal
The PCA performed on the set of MXD chronologies reveals that the first principal component (PC1) is dominant and www.clim-past.net/8/919/2012/ Table 1. Statistics of the calibration verification for the 24 regional chronologies developed. r indicates correlation between instrumental and predicted May-to-September temperatures (T MJJAS ): for the entire period 1900-2005 AD (r); for the entire period in series low-pass filtered with a 20 yr moving average (r lf ) and for the correlations for the two periods of the calibrations (r 1900−1952 and r 1953−2005 , respectively). RE: reduction of error for the two split periods. (**) indicates significant correlation at 99 % level.

Regional
Regional Split chronology type reconstruction code r r lf period r (1900−1952)  captures on average 60 % of the variance (Fig. 3). The high percentage of variance explained by PC1 shows slight variations across the different PCA runs, displaying higher percentages of explained variance for PCAs containing a smaller number of MXD chronologies (run1, run2, and from run21 to run25). The leading PC derived from chronologies standardized with 300sp and 300spPT, rather than with RCS or RC-SPT, also explains slightly more common variance, although the difference is very small. The first empirical orthogonal function (EOF) describes a coherent spatial pattern of the signal regardless of the standardization method used (Fig. 4). The individual points (MXD chronologies) present the same sign and similar magnitude in the maps of the first EOF for each of the four standardization methods. This suggests that spatially coherent modes of tree-ring variation exist, which may reflect regional-scale modes of climate parameters driving tree growth. The second and subsequent PCA modes, which are similar regardless of the standardization method used, become less spatially and seasonally coherent, with the sign and magnitude of the relationship, occasionally differing between individual points, likely in response to local or site influences. The leading mode (PC1) shows high correlations with the seasonal May-to-September temperature and has positive loadings across all the Pyrenees sites (Fig. 5). The highest correlations with climate are for the PC that includes the highest number of MXD chronologies (i.e. run4 and run5) and decreasing with decreasing number of series. The small differences observed in the variance captured by PC1 among the different standardization methods do not have a significant effect on the correlation between PC1 with Mayto September-temperature (T MJJAS ). The coefficients among the different standardization methods are almost identical. In summary, the mean correlation of PC1 and T MJJAS is 0.64, and it is slightly higher for the standardization method RCS and RCSPT (mean r = 0.65) than for 300sp and 300spPT (mean r = 0.63).
The regional chronologies computed as arithmetic means of the 22 MXD site chronologies ("mean") display generally higher correlations with T MJJAS , when the standardization methods RCS and RCSPT (RCS mean and RCSPT mean respectively) were used rather than 300sp and 300spPT (300sp mean and 300sp mean respectively) (Fig. 5). In comparison to the dominant mode of PCA, only the PC1s of run4 and run5 display larger correlation coefficients (mean r = 0.78 and mean r = 0.76, respectively) than the RCS mean and RCSPT mean (r = 0.71 and r = 0.72, respectively). In contrast, among the regional series computed as single chronologies ("chrono"), RCS chrono and RCSPT chrono overall display the lowest correlations with T MJJAS (below 0.4), while 300sp chrono and 300spPT chrono show coefficients of a similar order of magnitude to those displayed by the PCs.

May-to-September regional reconstruction: scaling and regression
A total of 24 regional reconstructions for the Pyrenees were developed using both scaling and regression techniques applied to each of the 12 regional chronologies developed. The correlations between the reconstructions based on nested PCs and the instrumental T MJJAS record are high for both high (<20 yr) and low-frequency (>20 yr) variations (Table 1). In contrast, the correlations of reconstructions from the group "chrono" are low and they reveal diverging lowfrequency information when using RCS and RCSPT as standardization methods (RCS chrono R, RCSPT chrono R, RCS chrono S, RCSPT chrono S). Reconstructions based on "mean" regional chronologies display high correlations with the instrumental T MJJAS ; however, the correlations are slightly lower in the low frequency domain compared to the reconstructions based on nested PCs, similarly to the mPYR. The calibration verification test conducted against the target T MJJAS in two periods (1900( -19521953 generally shows a coherent relationship between observed and predicted values for the 24 regional reconstructions; however, large differences in the reduction of error statistic (RE) appear (Table 1) Similarly, REs are also high (well above 0) for the two verifications. In contrast, even though correlation values are fairly high for the two periods, RE values decline to large negative values when the reconstructed values are obtained by linear regression of single regional chronologies, especially reconstructions from the group "chrono". Single regional chronologies computed as means display higher RE values when the reconstruction is performed using scaling rather than regression (RCS mean S, RCSPT mean S, 300sp mean S, 300spPT mean S), The set of reconstructions based on RCS chrono and RCSPT chrono displays RE values lower than zero regardless of the technique used for the reconstructions.
The set of May-to-September temperature reconstructions (T MJJAS ) spans the period from 1260 to 2006 AD. While most of the reconstructions display no clear long-term trend, the reconstructions RCSPT chrono S, RCS chrono S and mPYR clearly show an increasing trend starting in 1260 and lasting until around 1850 AD (Fig. 6). Despite this difference regarding the trends, all reconstructions display similar intermediate to low-frequency variations (multi-decadal to centennial time scales), which are, however, less visible in the case of RCSPT chrono R and RCS chrono R due to their reduced amplitude of the variations. Periods with persistent climate anomalies such as the Medieval Climate Anomaly (Bradley, 2003) and the Little Ice Age (Fagan, 2000) are not pronounced in the set of reconstructions. However, interdecadal variations clearly register the cold periods coincident with solar grand minima, such as those of Dalton (1800-1830 AD), Maunder (1680-1715 AD), Spörer (1480-1520 AD) and Wolf (1340-1380 AD) (Bard et al., 2000). In addition, a minimum in the modern age between 1950 and 1980 AD is shown by all 25 reconstructions including the mPYR. On the other hand, conspicuous maxima also appear during 1300-1330, 1360-1460 and 1710-1740 AD.
The total amplitudes of the 24 reconstructions and the mPYR range from the largest amplitude of RCSPT mean S (from −2.49 to +0.59 • C) to the smallest of PCs 300sp R (from −0.32 to +0.39 • C) at interdecadal time scales. Similarly, changes in the amplitude of variations do not differ much if only the preindustrial period is considered (1260-1900 AD).

Cross-comparison of model simulation and proxybased climate reconstructions
For the calibration period (1900, the two global (Erik1 and Erik2) and the regional MM5-Erik2 simulations display a marked positive trend in agreement with the instrumental series (Fig. 7). The global simulations display a mean increase of about +1 • C in this period, while for the regional models this increase is less pronounced, i.e. +0.4 • C for MM5-Erik1 and +0.7 • C for MM5-Erik2. The instrumental record displays an increase of approximately +0.7 • C for the same period. However, a pronounced negative temperature anomaly is exhibited in the later part of the 20th century, which is only reproduced by the regional simulation MM5-Erik1.
On the other hand, model simulations and proxy reconstructions disagree to some extent on the magnitude of the warming trend over the 20th century (1900 (Fig. 7). The reconstructions, including mPYR, mostly display negative trends over the same period, ranging from −1 • C (RCS chrono S) to −0.1 • C (PCs RCS S; PCs RCSPT S). Some reconstructions show trends close to zero (PCs RCS R; PCs 300sp R; PCs 300sp R), and only PCs RCSPT R displays a minor increase (+0.05 • C).
When comparing the full length of global and regional simulations and the collection of May-to-September temperature reconstructions, the large interdecadal changes are in agreement (Fig. 7b). A high synchrony is found among the simulations and the tree-ring based reconstructions in periods with large negative temperature anomalies observed at the end of the 18th century, during the Dalton and Spörer solar grand minima, and during periods of positive anomalies as well, e.g. during the middle of the 15th century, the middle to end of the 16th century and during the middle of (1 to 25) and the regional chronologies "mean" and "chrono" and May-to-September mean temperature (T MJJAS ) for each standardization method (RCS, RCSPT, 300sp, 300spPT). the 18th century. Some simulations lack pronounced negative periods such as the Late Maunder solar minimum, which are present in the reconstructions. The simulations Erik1 and MM5-Erik1 display identical periods of negative anomalies corresponding to the Maunder solar minimum, in agreement with the negative anomalies displayed by the set of reconstructions. In the simulations Erik2 and MM5-Erik2, this negative period is delayed.
The largest amplitude of the past temperature variations at interdecadal time scales during the total study period 1260-1990 AD (Fig. 7c) is identified for MM5-Erik2 (from −1.8 to +0.5 • C), among the simulations, and for RCSPT chrono S (from −2.49 to +0.6 • C), among the reconstructions. However, the amplitude of the model simulations is generally larger (mean ranging from −1.59 to +0.55 • C) than the amplitude of the reconstructions (mean ranging from −0.83 to +0.43 • C).
If only pre-industrial interdecadal temperature variations (1260( -1900 are considered, the model simulations display a mean range of amplitudes from −1.6 to −0.07 • C, while for the set of temperature reconstructions the amplitude ranges from −0.81 to +0.38 • C. In this case, MM5-Erik1 now displays the largest amplitude among the model simulations (from −1.7 to +0.2 • C), while for the set of reconstructions RCSPT chrono S still displays the largest amplitude (from −2.49 to +0.6 • C).

Discussion
How the individual chronologies should be standardized while preserving as much climate information as possible and how they should be aggregated into regional composites to reduce local influences are key challenges in dendroclimatology. In the context of low frequency preservation, the RCS has been suggested as an alternative to the more traditional standardization procedures such as 300-yr splines or negative exponential functions (Esper et al., 2003). RCS does not preserve decadal or annual frequencies with better efficiency than other techniques (Bunn et al., 2004), but it is assumed to retain more low frequency variability (Esper et al., 2002(Esper et al., , 2004. On the other hand, the development of regional reconstructions from a set of several individual chronologies (from the same and/or different variables) has so far been achieved by PCA (Peters et al., 1981;Enright, 1984;D'Arrigo et al., 1999), arithmetic means (Briffa et al., 2004;Tang et al., 2009) or regional chronologies compiled into a single dataset including all the individual measurements Büntgen et al., 2008).
The standardization techniques used in the present study have contributed to visible differences among various reconstructions, as reported by other authors (Esper et al., 2002;Helama et al., 2004). However, our results suggest that these differences may be enhanced when deriving a regional chronology from the individual series rather than by aggregation of the individual site chronologies. Hence, we could not find large differences among regional reconstructions when they were calculated as an average of the 22 MXD site chronologies ("mean") or when using nested PCs (PCs), regardless of the standardization procedures (300-yr spline or RCS, with or without PT) applied to the individual chronologies. Differences due to the standardization arose in the regional reconstructions, which were derived from regional chronologies computed as a single chronology including all 804 individual series in one run (the group named "chrono"). Within this group, regional chronologies standardized using RCS (RCSPT chrono and RCS chrono) display the largest differences, whereas chronologies standardized with a 300yr spline (300spPT chrono and 300sp chrono) produced results more similar to the results of the "mean" and "PCs" regional chronologies.
Correlations between T MJJAS and RCS chrono and RC-SPT chrono are consistently lower compared to the correlations displayed by other regional chronologies, pointing to a less well-preserved climate signal. Furthermore, the resulting reconstructions based on these regional chronologies display either a pronounced positive trend spanning the entire series lengths (RCSPT chrono S and RCS chrono S) or an extremely reduced amplitude of past climate variations (RCSPT chrono R and RCS chrono R). Similarly, PYR and mPYR are also based on a regional chronology computed using RCS and display similar but less pronounced positive Fig. 6. May-to-September mean temperature (T MJJAS ) reconstructions. (a) Calibration-verification for the 24 different regional reconstructions performed by regression (grey lines) and scaling (blue lines), the mPYR (black line) and the instrumental T MJJAS record (red line). The bottom panel displays low-pass filtered series (20 yr moving average). Mean correlations (r m ) for the two split periods (1900( -1952( and 1953 are also shown; (b) comparison between the instrumental series, the 24 Pyrenees regional reconstructions and mPYR for the entire reliable period; (c) 20 yr low-pass filtered reconstructions and instrumental record. The dashed black line represents the original May-to-September maximum temperature reconstruction from Büntgen et al. (2008) (PYR); (d-e) solar and volcanic forcing derived from Crowley (2000) and used to drive the climate models. Shaded areas indicate periods with solar grand minima: Dalton minimum (DM), Maunder minimum (MM) and Spörer minimum (SP).
trend. These trends are in disagreement with all other newly developed reconstructions, including the reconstructions derived from 300sp chrono and 300spPT chrono.
Therefore, it becomes obvious that the differences stem from the computation of the regional chronology, not only by using a single file merging all the individual measurements together, but also from the application of the RCS as the standardization method.
RCS requires sampling loads structured in age classes with similar replication (Esper et al., 2002), which limits its application in practice. The set of samples from the Pyrenees, as well as the set used for PYR and mPYR, does not seem to adequately fulfil the RCS requirements: the sample depth diagram shows an extremely high number of samples concentrated in the most recent period (from 1800 to 2005 AD, approximately), whereas the replication in the rest of the period (from 1260 to 1800 AD) is much lower. Such anomalously high and low replications can introduce a bias into the final chronology (Cook and Peters, 1997;Esper et al., 2003;Helama et al., 2004) and therefore in the resulting reconstruction. Thus, it seems likely that the trend of RCSPT chrono S and RCS chrono S may be an artefact due to a suboptimal dataset used for the regional chronology constructed using RCS. Such a trend was removed in the case of RC-SPT chrono R and RCS chrono R due to their poor calibration, which led to a subsequent underestimation of the variability imparted by the regression technique, and to reconstructions with reduced amplitude (Cook et al., 1994).
Based on the calibration verification tests, regional reconstructions derived from nested PCs performed better than others, regardless of the standardization procedure applied to the individual chronologies, or irrespective of the reconstruction technique used, which indicates that this approach emphasizes the leading mode of co-variation between the tree-ring series (Peters et al., 1981). The amplitudes of our reconstructions are also larger when performed with scaling than with regression; however, the RE values become low or negative due to inflated error estimates, as previously reported by Esper et al. (2005b) and McCarrol et al. (2011). However, for series with generally good calibration, the RE values were similar for the reconstructions based on scaling as well as on regression. Thus, differences in RE between both reconstruction techniques are reduced, as the proxy better records the climate signal.
Despite showing high correlations with the instrumental record at low frequencies, the 24 regional reconstructions and the mPYR graphically display general overestimations of past temperatures until 1940 and underestimations from 1980 AD onwards. This systematic misfit between instrumental measurements and proxy estimates has been previously described and the possible causes discussed . The tree-ring data from the Pyrenees display a strong and stable relationship with temperature, and therefore, a loss of tree sensitivity to climate seems unlikely ("the divergence problem"; Briffa et al., 1998;D'Arrigo et al., 2008). Thus, this mismatch may be due to the effect of the standardization methods on the low-frequency climate information encoded in tree-ring series (Cook et al., 1995;Esper et al., 2002) or alternatively related to the instrumental data used for calibration Esper et al., 2005bEsper et al., , 2010. The selected instrumental data for the Pyrenees and the homogenization and adjustments applied to these data also have a determinant effect on the resulting proxy-based reconstructions . In our case, no clear conclusion can be drawn without further specific work directed to answer this particular question.
The comparison between the output of the model simulations and the range established by the different reconstructions shows synchronized variability at the multi-decadal time scales. This synchronism indicates that the external forcings applied to the model efficiently reproduce the timing of the main periods of anomalous past temperature variations in the Pyrenees region. Indeed, the spread among the four simulated temperatures is mostly within the range estimated from the different standardization and reconstruction methods applied over the last 750 yr. These results confirm that past multi-decadal temperature variations in this region are driven to a great extent by solar activity and volcanic eruptions.
Model simulations, however, also show some clear discrepancies in comparison to the tree-ring based reconstructions, such as the duration and magnitude of warming or cooling events of particular climatic episodes (e.g. in the 15th and 16th century). A likely reason for such differences is that at higher frequencies, i.e. decadal time scales, the agreement between model simulations and reconstructions is lower, because the climate variability at these time scales is mostly internal and not driven by the external forcing (Zorita et al., 2007;Gómez-Navarro et al., 2012).
However, the most relevant differences between the model simulations and proxy-based reconstructions are the positive trends from 1850 AD-present displayed by the simulations but not reproduced by the reconstructions, and the differences in reproducing temperature variability within the 20th century.
On the one hand, the positive trend shown by the simulations is likely due to the external forcings prescribed. The increasing trends in regional and global temperatures described by the simulations may be ascribed to the increase in solar irradiance and greenhouse gas forcing. However, since the model runs incorporate both natural and anthropogenic forcings, it is difficult to distinguish the effect of the individual forcings in the resulting simulation and especially in the resulting trend. Thus, it is not possible to know whether the observed difference between these simulations and reconstructions is due to an underestimation by the proxy-based reconstructions or to an overestimation by the models.
On the other hand, instrumental series and proxy-based reconstructions show a pronounced cooling from 1950 until approximately 1980 AD, which does not agree with the increasing warming trend replicated by simulations for the second part of the 20th century. It is important to note that cooling factors, such as aerosols and land-use or vegetation changes (Hansen et al., 2002;Betts et al., 2007), are not included as external forcing in the simulations. The Pyrenees range has experienced an increase of forest coverage of about 15 % since the 1950s until present due to land-use changes (Garcia-Ruiz and Lasanta, 1990) and because of reforestation politics (Ortigosa et al., 1990). Deforestation increases surface albedo, which may result in a cooling effect (Betts et al., 2007). After the minimum, the progressive increase in forest coverage may produce a reduction in albedo, and thus may have contributed to the increase in temperatures observed from mid-1970s onwards.
Aerosols in the troposphere also tend to produce surface cooling since they intercept incoming sunlight and reduce the energy flux arriving to the surface (Charlson et al., 1992). In addition to this direct effect, there are several "indirect", cloud-mediated effects of aerosols, which all result in cooling (Andreae et al., 2005), although its magnitude is still very uncertain. Whether land-use changes or aerosols or a combination of both can produce the observed negative temperature anomaly in the second half of the 20th century remains open to discussion, since their role in surface temperatures in this region is not well-known. Further simulations, combining different kinds of forcings and taking into account explicitly the increase in aerosols and land-use changes during the industrial era (i.e. Jungclaus et al., 2010), which are not considered here, could elucidate this uncertainty.
Finally, although internal variability can be claimed to be responsible for part of these differences, the analysis of two regional simulations, both driven by the same forcing used here, indicates that the role of internal variability in the long-term trends of the simulated surface temperature over the Iberian Peninsula is small (Gómez-Navarro et al., 2012). However, this does not rule out that the real internal variability, which is part of the observed record, may indeed explain part of the mismatch. Thus, it may also be possible that the described regional cooling is just due to internal variability without the need to involve vegetation changes or aerosol forcing.
Based on the differences in trend and magnitude of the 20th century warming described above, it is not surprising that, if we consider the whole common period (1260, the amplitudes of the temperature variations are considerably larger in the model simulations than in the most reliable reconstructions. If the 20th century is not considered, the amplitude of the annual temperature variations is still wider for the model simulations, but the differences with the tree-ring based reconstructions are reduced. It is worth mentioning that, among the model simulations, the finer resolution of regional models seems to enhance the interdecadal amplitude of the temperature variations.

Conclusions
We provide a set of newly developed May-to-September mean temperature reconstructions for the Pyrenees that represent an improvement in comparison to the reconstruction by Büntgen et al. (2008), although our reconstructions still do not perfectly match the low-frequency variations of the instrumental record. Similarities between the interdecadal temperature variations obtained for reconstruction variants indicate the good performance of all standardization procedures for retaining inter-annual to interdecadal climate information. However, when the regionally representative tree-ring series are constructed as a single chronology (including all single measurements into a single file) and standardized with the RCS method, unreliable results may be obtained if the requirements of the RCS method regarding sample replication and age categorization are not fulfilled. Overall, nested PCs have demonstrated to be the most appropriate approach to build a regional chronology based on a set of local chronologies, regardless of the standardization procedure used for the individual site chronologies. Additionally, no large differences were found between reconstruction techniques (scaling and regression) when using chronologies based on nested PCs. Thus, the quality of the final reconstruction seems to depend more on the standardization performed and the method applied to construct the regional chronology, rather than on the reconstruction method employed.
On the other hand, the good agreements on interdecadal scales between tree-ring based reconstructions and global and regional model simulations point to the combination of solar variability and volcanic activity as the main factors driving summer temperature variations over the last millennium in the Pyrenees. As we have developed regional reconstructions, it is difficult to conclude whether the more realistic topography and land-use categories of the regional model would yield a more accurate replication of temperature variations at local scales. However, the regional model seems to yield larger amplitudes of temperature variations compared to the global models. This is expected because variability is subdued when larger spatial averaging is applied.
Cooling factors, such as aerosols and vegetation changes, are not included as external forcings in the simulations, and their potential contributions to the changes in the surface temperatures in this region are still unclear. Thus, the mismatch in the 20th century between the simulations and the observations discussed here could be related to shortcomings in the models, in the external forcings used to drive these simulations, or in both. Further simulations from the same area, including aerosols and land use changes as external forcings, could help to solve some of the questions that remain unanswered. This analysis will be addressed in future studies Supplementary material related to this article is available online at: http://www.clim-past.net/8/919/2012/ cp-8-919-2012-supplement.pdf.