Reconciling reconstructed and simulated features of the winter Pacific/North American pattern in the early 19th century

. Reconstructions of past climate behavior often describe prominent anomalous periods that are not necessarily captured in climate simulations. Here, we illustrate the contrast between an interdecadal strong positive phase of the winter Paciﬁc/North American pattern (PNA) in the early 19th century that is described by a PNA reconstruction based on tree rings from northwestern North America, and a slight tendency towards negative winter PNA anomalies during the same period in an ensemble of state-of-the-art


Introduction
The Pacific/North American pattern (PNA) is one of the dominant modes of interannual winter atmospheric variability of the northern extratropics (e.g., Barnston and Livezey, 1987;Wallace and Gutzler, 1981). It strongly affects the weather and the hydroclimate of the North American continent, and contributes to the atmospheric bridge linking Pacific and Atlantic climate variability (e.g., Raible et al., 2001;Pinto et al., 2011;Baxter and Nigam, 2013). The behavior of this large-scale atmospheric circulation pattern before the observational period and its sensitivity to natural external forcing are less understood compared to other dominant climate modes, partly due to the limited number and temporal coverage of available PNA reconstructions. In fact, only one major winter PNA reconstruction, based on tree rings from northwestern North America, is available and only goes back to AD 1725 (Trouet andTaylor, 2010, hereafter TT2010)  Sect. 2.1 for details). Here, we investigate the PNA features described by a multi-model ensemble of state-of-the-art climate simulations of the last millennium and use pseudoproxy experiments (e.g., Lehner et al., 2012;Smerdon, 2012) applied to the same ensemble to improve our understanding of the PNA's behavior during the pre-industrial period, especially during the early 19th century, and to investigate compatibility between climate simulations and reconstructions.
The PNA pattern consists of a wave train spanning from the subtropical northeastern Pacific to the Gulf of Alaska, northwestern North America and the southeastern United States through centers of action of alternating sign (Fig. 1). Accordingly, a classical definition of the PNA index is the sum of the differences between its positive and negative centers of action (Wallace and Gutzler, 1981). The PNA can be interpreted as an amplification and dampening of the climatological stationary wave characterizing the pattern of the polar jet across North America (e.g., Notaro et al., 2006), which explains its reduced importance during boreal summer. The positive phase of the PNA includes an anomalously deep Aleutian low and an enhanced ridge-trough pattern across North America. It produces above-average temperatures over northwestern North America due to the stronger ridge over the North American Rockies with associated northward diversion of the westerly flow, and below-average temperatures and drier conditions across the south-central and southeastern United States due to increased southward penetration of cold Arctic air masses. The signature is reversed for the negative phase of the PNA.
On sub-monthly timescales, the PNA variability and predictability are largely determined by internal dynamics of the mid-latitude atmosphere, while on longer timescales they are most prominently controlled by forcing from sea-surface temperature (SST) signals from the tropical Pacific (Younas and Tang, 2013). Horel and Wallace (1981) were the first to identify a connection between the PNA and the equatorial El Niño-Southern Oscillation (ENSO). Since then, observational and modeling studies have revealed that boundary conditions relevant for the PNA also include low-frequency SST signals in the extratropical North Pacific , remote forcing from the North Atlantic (Baxter and Nigam, 2013) and upstream conditions determined by the East Asian jet (e.g., Gong et al., 2007).
Climate simulations of the last millennium indicate increased likelihood of a significantly weaker Aleutian low after strong tropical volcanic eruptions, suggesting that the PNA can dynamically respond to volcanic forcing on interannual-to-decadal timescales (Zanchettin et al., 2012;Wang et al., 2012). A connection between PNA variability and natural forcing is also suggested by the TT2010 reconstruction, which shows a prolonged strong positive phase of the PNA during the early 19th century. Indeed, this period was characterized by a close succession of strong volcanic eruptions concomitant with a phase of weak solar activity, both contributing to exceptionally cold climate condi-tions (Cole-Dai et al., 2009). However, TT2010 attributed the early 19th-century anomalous PNA phase to the decreased solar irradiance during the Dalton Minimum of solar activity (ca. 1790-1830), without discussing possible implications from the concomitant volcanic cluster.
The selection of proxy locations is crucial for the robustness and reliability of reconstructions of large-scale circulation modes (Lehner et al., 2012). Decadal-scale shifts in the centers of action of atmospheric modes like the North Atlantic Oscillation (NAO) or the PNA are associated with non-stationarities in the imprint of such teleconnection patterns on local precipitation and temperature (Raible et al., 2006Coats et al., 2013;Moore et al., 2013). Accordingly, the ring-width response to atmospheric modes like the PNA and the NAO is spatially heterogeneous due to the complex causal chain linking climate modes, local environment and seasonal tree growth . Reflecting such heterogeneity, the prolonged positive PNA phase in the early 19th century becomes less prominent if proxies from other PNA-sensitive regions are considered, such as river catchments in west-central British Columbia (Starheim et al., 2013) or lakes from the northeastern United States (Hubeny et al., 2010).
Twentieth Century Reanalysis Project data suggest that the PNA centers of action are less variable in space than, e.g., the NAO . However, the risk of insufficient coverage and representation of its different centers of action -and hence a poor reconstruction -is considerable for the PNA due to the complexity of its pattern and the strong interdependencies with surrounding or even superposing modes of variability. For instance, over the last 6 decades and especially in winter the PNA has been practically indistinguishable from the inverted North Pacific Index (NPI) describing the sea-level pressure (SLP) variability in the Aleutian low region. However, there is no indication as of yet about whether the NPI and the PNA (and their signatures on regional temperature and precipitation) can become distinguishable over periods of decades or longer. Similarly, latewinter temperature reconstructions in western North America, i.e., a region where the PNA climatic imprint is strong, have been recently used to test whether strong tropical volcanic eruptions induce a preferred phasing of ENSO (Wahl et al., 2014). However, the simulated teleconnection between ENSO and the North American climate is nonstationary on multidecadal timescales (Coats et al., 2013). The possible superposition of regional climate signatures of different largescale modes (like ENSO, PNA and NPI) and their possible nonstationarity poses a challenge for any reconstruction attempt using climate proxies from affected regions, particularly for producing robust and unambiguous reconstructions. There is therefore a need to assess the robustness of proxybased PNA reconstructions: if the TT2010 reconstruction is found to accurately capture the past PNA behavior, the reconstructed prolonged strong positive PNA phase during the early 19th century would represent an important feature for Clim. Past, 11, 939-958, 2015 www.clim-past.net/11/939/2015/ Dots mark grid points where the correlation is not significant at 95 % confidence accounting for autocorrelation. The green boxes mark the areas used for the calculation of the PNA index. In panels (b-i), the numbers reported in the respective titles are the spatial correlations between observed and simulated patterns calculated for the domain north of 20 • N (to this end, NCAR data were regridded to the model grid).
addressing in coupled climate simulations. Its robust reproduction by climate simulations would be strong evidence for its externally driven nature, while the opposite would suggest two possibilities: either an episodic excitation consistent with internal variability or limited realism of climate models due to common deficiencies. Thus, this study aims at answering the following questions: is the prolonged strong positive PNA phase in the TT2010 reconstruction during the early 19th century reliable? What can we learn from available climate simulations of the last millennium about its attribution? To answer these questions we compare simulated and observed/reconstructed PNA features, and perform a series of pseudo-proxy experiments on a multi-model ensemble. We also extend our pseudo-proxy investigation to determine whether there is margin to substantially improve the PNA reconstruction.

The TT2010 PNA reconstruction
The TT2010 reconstruction of the winter PNA covers the period 1725-1999. It is based on a multiple regression model using three winter climate sensitive tree ring records from the western United States as predictors (TT2010). The predictors were sampled from three regions, whose locations are marked by the green boxes in Fig. 2 and that are hereafter referred to as "Alaska" (northern box, here defined as 60-70 • N, 120-160 • W), "Montana" (middle box, 50-60 • N, 115-135 • W) and "Wyoming" (southern box, 35-  Table 1 in TT2010). The Alaskan predictor is most sensitive to winter temperature; the predictor from Montana captures winter precipitation at a relatively high-elevation site, whereas the predictor from Wyoming captures both autumn/winter precipitation and summer temperature at a relatively low-elevation, semiarid site (TT2010). The latter two predictors show the opposite sensitivity to precipitation despite their close location, highlighting the role of regional topographical features in determining the relationship between the biological sensor and the local environmental conditions. The combination of the selected three tree ring series explains 49 % of the variance of the winter PNA index for the calibration period 1949-1999. The TT2010 PNA reconstruction shows a prolonged period of positive PNA, with a peak in 1800-1820. The early 19th-century positive PNA phase was interpreted as a response to the decreased solar irradiance since it coincides with the period of weak solar activity known as Dalton Minimum and since subsequent periods of weak solar activity similarly correspond to positive PNA anomalies (TT2010).
Radiatively forced warming of eastern tropical Pacific SST associated with cold SST anomalies in the Aleutian Low region -corresponding to in-phase interactions between positive anomalies of both ENSO and the Pacific Decadal Oscillation -was reported as a possible dynamic explanation for the connection between decreased solar irradiance and positive PNA phase. TT2010 did not discuss the possible implications of the strong volcanic eruptions during the early 19th century. Possible mechanisms underlying volcanically forced PNA variability include both tropical-extratropical coupling via volcanically forced changes of ENSO (e.g., Li et al., 2013) and extratropical processes via, e.g., sea-ice responses in the Gulf of Alaska and the Bering Strait region (e.g., Zanchettin et al., 2014).

Observational and simulated data
We use monthly mean data obtained from the NCEP reanalysis (Kalnay et al., 1996;Kistler et al., 2001) for the period 1948-2013 as reference data for the observational period. The data were provided by NOAA/OAR/ESRL PSD, Boul-Clim. Past, 11, 939-958, 2015 www.clim-past.net/11/939/2015/ der, Colorado, USA. The NCEP reanalysis data set suits our needs since it encompasses the calibration period used in the TT2010 reconstruction. The Twentieth Century Reanalysis Project (Compo et al., 2011) extends further back in time, but the ensemble teleconnection patterns over the Pacific from this data set are poorly constrained during the first half of the 20th century (e.g., Raible et al., 2014). We include outputs from "past1000" and follow-up historical climate simulations from seven coupled general circulation and Earth system models contributing to the third phase of the Paleoclimate Modelling Intercomparison Project (PMIP3, Braconnot et al., 2012). All simulations are full forcing simulations, i.e., they describe the combined effects of all major natural and anthropogenic external forcing factors acting during the last millennium (Schmidt et al., 2014). Two simulations are considered for the Goddard Institute for Space Studies E2-R model (hereafter referred to as GISS-E2-R24 and GISS-E2-R25), which differ in the considered external forcing inputs. Table 1 provides a summary of the main characteristics of the models and simulations considered. Bothe et al. (2013) provide an assessment of the probabilistic and climatological consistency of the PMIP3-past1000 simulations relative to proxy-based reconstructions under the paradigm of a statistically indistinguishable ensemble. They diagnose distributional inconsistencies between ensemble-simulated surface air temperatures and the global temperature field reconstruction of Mann et al. (2009) over large areas of the globe, including PNA-sensitive regions over North America (see their Fig. 1). These full period inconsistencies originate mainly from differences in multicentennial to millennial trends (Bothe et al., 2013). By contrast, the ensemble was found to be probabilistically consistent with the reconstructed annual temperatures for the North American southwest beginning in the year 1500 (Wahl and Smerdon, 2012).

Indices and definitions
The following indices and definitions are considered based on monthly mean data: -the PNA is calculated using the modified point-wise method currently adopted by the NOAA-CPC and applied to 500 hPa geopotential height (Z500) data. -the NPI is calculated using the definition from Trenberth and Hurrell (1994) applied to SLP data. The index is computed as the spatial SLP averaged over 30-65 • N; 160-220 • E, so that positive phases of the index indicate a weaker-than-normal Aleutian low and the opposite holds for the negative phases; -the Southern Oscillation Index (SOI) is calculated based on a modified version of the Tahiti-Darwin index. It is defined as the difference between the average SLP over the domains 20-15 • S; 147-152 • W and 15-10 • S; 128.5-133.5 • E. The SOI is here preferred to SST-based ENSO indices since we focus on the atmospheric component of ENSO.
Indices are not standardized by default in order to highlight inter-model differences in the climatology and in the amplitude of fluctuations associated with the indices.

Pseudo-proxy experiments
Pseudo-proxy experiments are conducted to test the robustness and potential skills of PNA reconstructions using solely geophysical predictors from northern North America. Millennium-scale transient simulations from climate models provide a long and physically consistent framework where paleoclimate reconstruction methods can be altered and evaluated systematically in absence of the spatial and temporal discontinuities of the real-world climate proxy networks (Smerdon, 2012). In particular, they may allow one to determine an upper limit to the accuracy of the reconstruction of large-scale modes under limited spatial sampling.
Our pseudo-proxy experimental approach is meant as a generalization of the method used in TT2010. Specifically, the reconstruction is based on a multi-linear least-square regression model of the general form y t = i=1:N a i × t i + y 0 + ε t , where y t is the reconstructed value at time step t, N is the number of predictors, a i is the regression coefficient of the ith predictor x i , y 0 is the intercept and ε t is the residual at time step t. All data are normalized based on the full period before the pseudo-proxy experiments are conducted.
Pseudo-reconstructions are performed as follows: first, the pool of candidate predictors including temperature and precipitation data is determined by defining three regions over North America. Then, an ensemble of predictor sets is built by iteratively (up to 1000 times) and randomly sampling data from one grid point from each region. Finally, an ensemble of PNA pseudo-reconstructions is obtained by using the built sets in a multi-linear regression. Thus, the robustness of PNA pseudo-reconstructions with a TT2010-like design is tested using predictor sets that mimic the quality of the TT2010 reconstruction. Accordingly, we consider only www.clim-past.net/11/939/2015/ Clim. Past, 11, 939-958, 2015 Table 1. Simulations considered in this study. Columns, from left: model and, in parentheses, simulation name; atmospheric and oceanic components (with resolution in parentheses); applied solar (S) and volcanic (V) external forcings; considered periods of the past1000 (P) and historical (H) integrations; references/sources of information. Names of models and simulations follow the acronyms adopted in the Coupled Model Intercomparison Project (CMIP) 5 repository. Full references for the applied solar and volcanic forcing are Vieira et al. (2011), Gao et al. (2008), Crowley (2000), Jones and Mann (2004) and Crowley et al. (2008).
Model/simulation Components and resolutions Natural forcing Time intervals References  (2004)  pseudo-reconstructions with R 2 skill metric in the range between 0.45 and 0.55 for the calibration period (R 2 c ); i.e., the selection is based on calibration skills instead of on a preliminary screening of climate proxies.
We follow a perfect model approach with noise-free predictors, and the considered range of R 2 c is meant to account for the possible effects of noise in the actual climate proxies. The inclusion of noise in the predictors and its influence on the results are briefly investigated with a series of pseudo-proxy experiments where predictors are artificially perturbed by different types and levels of noise (Sect. 4). Skill metrics calculated for such noise-free predictor sets and regression models, but using other climate indices as validation target instead of the PNA, clarify whether these pseudoreconstructions distinguish the PNA from other modes influencing North American regional climates. Additionally, the PNA pseudo-reconstructions pertaining to the upper quartile of R 2 c for each simulation provide a crude estimate of the quality of PNA reconstructions obtainable with a TT2010like method for the given set of sampling regions. An exemplary different set of regions is also considered in the same reconstruction approach to assess whether regions not included in the TT2010 design may allow for a notable improvement in the accuracy and robustness of PNA reconstructions.
Skill metrics include R 2 and coefficient of error (CE) (Cook et al., 1994). CE is defined as 1 − t=1:M (x t − y t ) 2 / t=1:M (x t − x mv ) 2 , where x t and y t are the observed and the predicted index in year t, respectively. x mv is the observed mean index over the validation period and M is the number of years in the validation period. R 2 values are also calculated for successive 30-year periods to highlight the ro-bustness of the pseudo-reconstructions over different interdecadal periods.
Unlike in TT2010, the predictors sampled herein from the "Montana" (middle) and "Wyoming" (southern) boxes are winter temperature and precipitation, respectively. This choice guarantees that our pseudo-reconstructions encompass the desired R 2 c range in all models, which is hardly achieved for pseudo-proxies from these regions following the original definition. In particular, reconstruction skills considerably degrade if the predictor for "Wyoming" is defined as summer temperature instead of winter temperature (see Sect. 2.1). This does not affect the generality of our conclusions, since we aim at testing PNA reconstructions based solely on local geophysical predictors from northwestern North America, not at replicating the linkage between biological sensors and the local environmental forcing at the basis of the TT2010 reconstruction.
All the following analyses are performed using winteraverage (DJF) data and using 1950-1999 as the calibration period, unless specified otherwise. Furthermore, unless specified otherwise, the validation period is defined as the period spanning from the beginning of the simulation to the last year before the calibration period. The use of unsmoothed data limits the effect of high autocorrelation leading to spurious high skill metrics (Macias-Fauria et al., 2012).

Simulated PNA during the observational period
First, we assess whether the employed models/simulations represent the observed/reanalyzed dominant large-scale circulation and associated surface air temperature and precipi-Clim. Past, 11, 939-958, 2015 www.clim-past.net/11/939/2015/ tation patterns with sufficient accuracy for the observational period. This comparison guarantees that they are suitable for the subsequent analyses. The four centers of the observed PNA wave pattern are generally well captured by the simulations (Fig. 1). A number of simulations, and most noticeably GISS-R24/-R25 (Fig. 1e, f), display a weaker tropical center in the Pacific suggesting a weaker connection between tropics and extratropics. Higher model resolution does not systematically improve the overall quality of the hemispheric pattern. For instance, the PNA imprint over the Arctic as well as the Pacific-Atlantic teleconnection are too strong in the highly resolved CCSM4 (Fig. 1c). The PNA pattern of the lowestresolution model (FGOALS-gl) has an overall weaker hemispheric imprint and the negative center over Florida is displaced westward over Mexico (Fig. 1d), possibly reflecting an inadequate representation of the Rocky Mountains.
A similar behavior is found for the simulated spatial patterns of NPI (see Supplement Fig. S1). Most noticeably, the NPI pattern in FGOALS-gl includes strong negative correlations over central North America, again pointing to low-resolution topographic issues. All simulations show a good representation of the NAO pattern over the North Atlantic/Europe and China, but often overestimate its signature over the North Pacific (Fig. S2). The simulation of the SOI pattern is a challenge for most of the models, especially concerning its signature over the extratropical North Pacific and North America (Fig. S3).
Our pseudo-reconstruction approach also requires that simulations produce reliable imprints of the PNA -as well as of NPI, SOI and NAO -on North American winter surface air temperature and precipitation (Figs. 2 and 3). The observed correlation pattern between PNA and continental temperature is characterized by an approximately meridional stretch of positive correlations along the western coast of North America, which extends eastward into continental regions at mid-to polar latitudes, and by a center of negative correlation over the Sargasso Sea/Florida (Fig. 2a). Simulations capture both features with varying quality (Fig. 2b-i). For instance, the Sargasso Sea/Florida center is displaced in BCC-CSM1-1 and FGOALS-gl, while it is underrepresented in GISS-E2-R25, IPSL-CM5A-LR and slightly so in MIROC-ESM. Overall, FGOALS-gl presents the worst representation of this correlation pattern possibly due to the deficiencies noticed above in the 500 hPa PNA pattern. Similar conclusions can be drawn about the NPI signature of North American winter temperatures (Fig. S4). Simulations and reanalyses consistently point to a limited imprint of NAO and SOI on North American winter temperatures (Figs. S5 and S6), which for both modes partly superposes PNA signals.
Similar considerations could be derived for winter precipitation, but correlation patterns between large-scale circulation modes and precipitation over land are patchier than for temperature. Overall, the quality of simulated precipitation patterns compared to reanalyses is clearly poorer than for temperature. Both reanalyses and simulations indicate locally significant negative correlations between PNA and precipitation in the mid-latitude United States (i.e., wetter conditions under negative PNA, and vice versa), but with substantial differences in the details of the pattern (Fig. 3). An important robust feature is that all simulations except GISS-E2-R25 indicate weak negative correlations over the central Rocky Mountains (Fig. 3b-i), a region where precipitationsensitive proxies were screened for the TT2010 reconstruction.
In summary, the correlation patterns reveal a marked heterogeneity between simulations in the quality of their representation of dominant large-scale circulation modes and associated imprint on the North American climate. Of course, the spatial patterns are derived from the chosen 50-year period within single transient simulations, and are therefore not necessarily representative of the quality of the different models. Still, some general features are recognizable: a coarsely resolved North American topography and a poor representation of tropical and extratropical Pacific interactions are likely two major challenges limiting the quality of the simulated PNA imprints. The most apparent issues concern FGOALS-gl, potentially due to its coarser resolution compared to the other models.

Simulated PNA features during the last millennium
The evolution of the winter PNA index throughout the last millennium shares little resemblance between the different simulations ( Fig. 4a), which is indicative of a limited effect of the external forcing since the latter is very similar across the ensemble. Decadal and interdecadal phases of strong positive or, similarly, strong negative PNA appear at different periods in different simulations, suggesting that, in general, the PNA is mostly determined by internal variability at these timescales. No simulation displays a prolonged strong positive PNA phase during the early 19th century as featured by the TT2010 reconstruction, but decadal-scale positive PNA anomalies of similar relative amplitude emerge sporadically in the ensemble during different periods (see dots in Fig. 4a). Such prolonged positive-PNA events are, however, rare. Unlike in the reconstruction, the early 19th century is characterized by predominant negative PNA trends in the simulations (the period of discrepancy is highlighted by a horizontal red bar in Fig. 4b).
Running-window correlations between the PNA index and the other indices provide a simple assessment of the variable strength of PNA teleconnections in the different simulations. Reanalysis data indicate that winter PNA and NPI are practically indistinguishable: the two indices are robustly highly anti-correlated (thick black line in Fig. 5a). Simulations consistently feature significant negative PNA-NPI correlations through the last millennium, although with considerable differences within the ensemble concerning strength and stationarity of the statistics (Fig. 5a)  the strongest and most robust correlations, which overlap with values from reanalyses, whereas FGOALS-gl produces the weakest and most time-varying correlations. The simulated winter PNA-NAO correlations are generally weak and negative during the last millennium, in agreement with the non-significant and strongly varying statistics from reanalysis data (Fig. 5b). Some simulations feature multidecadal periods when the negative correlation becomes statistically significant, suggestive of a temporarily strong atmospheric connection between the North Pacific and North Atlantic sectors. This is especially the case for CCSM4 (compare also superposing patterns in Figs. 1c and S2c). The negative PNA-NAO correlations represent periods when the atmospheric bridge linking Pacific and Atlantic climate variability is active (for a dynamical description see, e.g., Raible et al., 2001;Pinto et al., 2011;Baxter and Nigam, 2013). Decadal active phases of such a bridge in the form of persistent negative PNA/positive NAO pattern have been attributed to both internal variability (Pinto et al., 2011) and strong volcanic forcing (Zanchettin et al., 2012). The winter PNA-SOI correlation is significantly negative in the reanalyses, though not very strong (Fig. 5c). CCSM4 produces PNA-SOI correlations that remain robustly around this observed value throughout the last millennium, while the other simulations produce generally lower and more variable correlations (Fig. 5c). In BCC-CSM1-1, MPI-ESM-P and MIROC correlations between SOI and both PNA and NPI (the latter not shown) are only sporadically significant, meaning that these models feature a weak connection between the tropical and extratropical North Pacific. Note that these results do not qualitatively change if running-window correlations are calculated over longer periods. Changes in the relative importance of large-scale modes for North American winter climate variability during the last millennium are assessed by comparing the fractional variances of North American surface air temperature and precipitation that are explained by the different indices over sliding 30-year periods paced at 1-decade intervals. The winter PNA is the dominant mode of simulated North American winter temperature variability among the considered indices (PNA, NPI, ENSO), generally explaining around 20 % of the total variance (Fig. 6). Only for FGOALS-gl are there several pe-  riods when the NPI becomes more dominant than the PNA. The strength of all index signatures changes through time. The fraction of North American winter precipitation variability explained by the indices is generally below 10 %, and the dominance of PNA over the other indices is less clear than for temperature (Fig. S7).
In summary, internal variability is an important factor for the simulated PNA during the pre-industrial millennium. The ensemble markedly disagrees with the TT2010 reconstruction, whose strong positive phase in the early 19th century was interpreted as resulting from a strong PNA response to solar forcing (TT2010). Correlations between indices reveal substantial differences in the simulated representation of teleconnections both within the ensemble and in comparison to reanalyses. Among the considered indices, PNA generally explains the largest fraction of North American winter temperature variability. Only FGOALS-gl features prolonged periods when PNA and NPI explain comparable fractions of North American winter temperature variability. We are therefore confident that through proper sampling of precipitation and especially temperature proxies over North America, pseudo-reconstructions are able to express robust PNA signals rather than signals from other indices.

PNA pseudo-reconstructions
First, we validate the reconstruction approach in TT2010. This is done in a perfect model framework by testing whether a reconstruction design based solely on geophysical predictors from northwestern North America can provide meaningful pseudo-reconstructions of the PNA. We use only predictor sets that provide a calibration skill comparable to that of the actual TT2010 reconstruction (Fig. S8 summarizes the full ensemble of pseudo-reconstruction calibration skills).
The so-obtained PNA pseudo-reconstructions are generally skillful according to both employed full validation metrics (R 2 v and CE): only a few pseudo-reconstructions give CE values below 0, meaning they have no predictive skill and are hence unacceptable (Fig. 7a). R 2 v values can exceed the imposed R 2 c range (see columns of numbers in Fig. 7a). In some simulations, different performance between the validation and calibration periods can be related to the presence of significant local trends in North American winter temperatures during the latter period (Fig. S9) that are mostly due to strong anthropogenic forcing imposed in the second half of the 20th century. For some simulations and predictor sets, the validation-period R 2 values (R 2 v ) for the PNA overlap with those from the NPI (Fig. S10), meaning that in these cases the pseudo-reconstructions are hardly effective in distinguishing PNA-related features from other signals.
The robustness of a reconstruction through time is a major concern for its reliability. Figure 8 shows that for the same set of predictors used for Fig. 7a, the R 2 skill of the pseudoreconstructions changes remarkably through time: there are periods when the pseudo-reconstructions from one simulation are consistently poorer, other periods when they are consistently better and there is generally a large spread in the quality of the pseudo-reconstructions. The multidecadal and centennial variations of R 2 suggest that there is structural uncertainty on these timescales. Note that since metrics are calculated based on deviations from the 30-year average, the so-defined R 2 describes the reconstruction skills mostly regarding interannual-to-decadal variability. The CE metric, which accounts for the 30-year mean state, is characterized by non-stationarity and inter-model spread similar to R 2 (not shown). Running-window statistics further indicate that the models substantially differ concerning the comparative skills of reconstructed indices (top panel of Fig. 8): in some models, and most noticeably in GISS-E2-R, the skills for the PNA are often not better than for NPI; in other models, like MPI-ESM-P and BCC-CSM1-1, the skills for the PNA are better than for all other indices. We conclude that the approach is effective in distinguishing the reconstructed PNA from other indices only for models such as MPI-ESM-P and BCC-CSM1-1.
Another of our main objectives is to assess the robustness of the interdecadal strong positive PNA phase identified by the TT2010 reconstruction in the early 19th century. Figure 8 shows particularly large inter-model spread in the R 2 metric during the late 18th and early 19th centuries, with some simulations performing very well (IPSL-CM5A-LR, GISS-E2-R25 and MPI-ESM-P), while others, for those time periods, show some of the poorest skills in the entire millennium (e.g., BCC-CSM1-1 and GISS-E2-R25). It should be noted that the PNA pseudo-reconstructions are generally biased towards negative PNA values (black vertical lines in Fig. 9). This is mainly due to the fact that most simulations have strong local trends in 20th century temperatures (Fig. S9), which result in PNA pseudo-reconstructions roughly following a hockey-stick shape over the last millennium. As a consequence of this bias, our pseudo-reconstructions tend to underestimate interdecadal phases of very strong positive PNA that are detected throughout the simulations (see the negatively centered histogram in Fig. 9a). By contrast, interdecadal phases of strong positive PNA in the pseudoreconstructions seem to describe the actual PNA conditions more accurately, as shown by the rather symmetric, almost zero-centered probability distribution of ensemble residuals for these events (Fig. 9b). Thus, whereas actually simulated prolonged strong positive PNA phases may not be correctly captured by pseudo-reconstructions, prolonged strong positive phases emerging in the pseudo-reconstructions are generally "true".
There are further concerns about the capability of pseudoreconstructions to accurately capture the PNA low-frequency variability. We find that pseudo-reconstructions tend to overestimate the amplitude of multidecadal-to-centennial fluctuations (Fig. 10). In some simulations, like BCC-CSM1-1, CCSM4, FGOALS-gl and MPI-ESM-P, the spectra of the pseudo-reconstructions entail large-amplitude peaks in this frequency band that do not appear in the spectrum of the actual index. GISS-E2-R and MIROC, by contrast, produce pseudo-reconstructions whose spectra agree fairly well with that of the simulated PNA. The different agreement between pseudo-reconstruction spectra and the actual PNA spectrum implies that the models disagree about whether (i) the lowfrequency temperature and precipitation excursions captured by the pseudo-proxies are related to the PNA and/or whether (ii) the PNA reacts to the same low-frequency forcing as the temperature and precipitation pseudo-proxies do. Combining evidence from Figs. 8, 9 and 10, we conclude that the errors in the pseudo-reconstructed prolonged positive PNA phases and, more generally, the variable skills of pseudoreconstructions on multidecadal and centennial timescales reflect misrepresentation of low-frequency PNA variability by the pseudo-reconstructions. This casts doubt on the reliability of the early 19th-century PNA event identified by the TT2010 reconstruction.
Considering PNA pseudo-reconstructions instead of the actually simulated PNA indices does not solve the discrepancy between simulations and the TT2010 reconstruction during the early 19th century (Fig. 4c). In fact, none of the simulations display significant positive winter temperature anomalies over northwestern North America during the pe- riod 1800-1820 (Fig. S11), which would be consistent with a positive phase in the PNA pseudo-reconstructions following the current definition. Instead, the anomalous patterns are characterized by a marked heterogeneity, suggesting lack of a robust response to external forcing across the models, with no pattern resembling the typical PNA structure. Accepting the reconstructed PNA behavior during the early 19th century as accurate and the simulated climates as realistic, the apparent discrepancy can only be solved by interpreting the first as a particular event of internal climate variability, hence unlikely captured (in its temporal occurrence) in a small-size ensemble as the one at hand. Indeed, a similar discrepancy is found in the late 1940s for a reconstructed decadal-scale negative PNA phase (Fig. 4c), a period not characterized by prominent (inter)decadal forcing events.

Designing new PNA reconstructions
A natural question at this point is whether there is margin to improve reconstructions of the PNA. Potential for improvement may come, for instance, from the inclusion of new and/or better predictors over northwestern North America. Simulations disagree about whether the skills of the actual TT2010 PNA reconstruction and of its synthetic analogs represent the limit of this reconstruction approach: some simulations (e.g., BCC-CSM1-1, FGOALS-gl) indicate that the Clim. Past, 11, 939-958, 2015 www.clim-past.net/11/939/2015/ calibration skill of the TT2010 reconstructions is close to the expectation for the method (Fig. S8). In other simulations, including CCSM4 and IPSL-CM5A-LR, geophysical predictor sets from northwestern North America tend to produce R 2 c values above the 0.45-0.55 range, while this range is in the upper tail of the R 2 c distribution in MPI-ESM-P (Fig. S8). The subsets of predictors yielding the highest R 2 c values among the considered random sets delineate the upper bound of this reconstruction method with the inclusion of improved predictors. Accordingly, analogously to Fig. 7a,  Fig. 7b summarizes the skill metrics for the subset of pseudoreconstructions with R 2 c values in the upper quartile of the R 2 c distribution. BCC-CSM1-1 and CCSM4, and to a lesser extent, GISS-R24 and IPSL-CM5A-LR, indicate the upper potential for the TT2010 approach, especially in terms of R 2 v (compare panels (a) and (b) in Fig. 7). This is not the case for FGOALS-gl, GISS-E2-R25 and MPI-ESM-P, whose best skill scores indicate that pseudo-reconstructions from northwestern North American predictors are unlikely to be capable of explaining substantially larger variance than obtained in the actual TT2010 reconstruction. The simulations also disagree about whether an improved selection of predictors would lead to more distinguishable reconstruction skills between PNA and NPI (not shown).
The correlation patterns between the residuals of the PNA pseudo-reconstructions illustrated in Figs. 7a and 8 and North American winter temperatures (Fig. 11) indicate that, consistently among the simulations, an approach only using predictors from northwestern North America lacks important information from the southwestern and southeastern United States. Both regions correspond to characteristic regions for the PNA signature of North American winter temperatures (Fig. 2). The residual correlation patterns and their robustness reflect structural deficiencies, and suggest possible changes in the reconstruction design to improve PNA reconstructions. Inclusion of temperature information from the southeastern United States would, for instance, reduce the risk of erroneously interpreting periods of spatially uniform continental warming/cooling or moistening/drying over North America as positive/negative PNA phases.
Accordingly, Fig. 7c outlines the potential of the reconstruction method through an improved selection of proxy locations and extended calibration period. In this case, the predictor sets include temperature sampled from a box located over Florida instead of precipitation sampled from the southern box over the western United States, so that the model can capture information from the easternmost negative PNA center (Figs. 1 and 2). The potential quality of the PNA pseudo-reconstructions obtained with this design is greatly improved according to both considered skill metrics (compare panels (b) and (c) in Fig. 7). The quality of the pseudo-reconstructions still varies substantially through the last millennium (not shown), but the risk of periods of unskillful reconstructions (CE < 0) is much lower than for a design limited to northwestern North America. The pseudo-reconstructions further display improvement in the negative bias and a substantially better representation of lowfrequency PNA variability (Fig. S12). However, the models disagree about which factor (i.e., extended calibration period or inclusion of a temperature predictor for the southeastern United States) more strongly contributes to the improved results.
In summary, pseudo-proxy experiments appear to be instrumental in both the designing and the assessment of future PNA reconstructions. Of course, the exemplary design proposed here represents an ideal setting, and future applications of this tool for real-world reconstructions would require the pseudo-proxy experiments to be designed based on the quality and type of actually available proxies.

Discussion
In order to understand the implications of our results for real-world proxy reconstructions and for the interpretation of last-millennium climate simulations, our discussion concentrates on three aspects: limitations of the reconstruction methods and of our pseudo-reconstruction design in particular; weaknesses in the simulated representation of the PNA, and of its teleconnections and variability; and issues related to (regional) climate attribution before the observational period and uncertainties affecting the simulation of the early 19th-century climate.
Our pseudo-proxy investigation reveals the inherent limitations of a PNA reconstruction method solely relying on local geophysical predictors from northwestern North America. Assumptions of linearity and stationarity between local hydroclimate variability and the large-scale atmospheric circulation described by the PNA are further weaknesses of the approach. In our linear definition, the PNA robustly dominates North American winter climate variability (Fig. 6), which is an encouraging premise for reconstruction attempts. Nonetheless, the so-defined PNA index may not capture shifts in the location of the mode's centers of action and in the associated teleconnections . Furthermore, our pseudo-reconstructions can be affected by the non-stationarity of other climate variability modes' teleconnection pattern to North America (such as ENSO; Coats et al., 2013). Pseudo-reconstructions suggest that margins exist to substantially improve the quality of the reconstructed PNA based on a TT2010-like multi-linear regression method, for instance if the multiple PNA-sensitive regions over North America are more exhaustively represented in the predictors' set and if the calibration period is extended. However, including temperature-sensitive predictors from southeastern North America and extending the calibration period to the full 20th century, as in our pseudo-proxy experiment (Fig. 7c), may be difficult due to the nature of real-world climate proxies and limitation of observational data suitable for model calibration. First, as noted above, the ensemble teleconnection www.clim-past.net/11/939/2015/ Clim. Past, 11, 939-958, 2015   Figure 11. Correlation maps between the ensemble-average residuals (predicted value minus true value) from the winter (DJF) PNA pseudoreconstructions obtained following the TT2010 approach and illustrated in Figs. 7a and 8 and winter surface air temperature time series for the pre-industrial period up to 1849. Dots mark grid points where the correlation is not significant at 95 % confidence accounting for autocorrelation. The green boxes mark the areas used for the TT2010 reconstruction.
Clim. Past, 11, 939-958, 2015 www.clim-past.net/11/939/2015/ patterns in Twentieth Century Reanalysis Project data -the longest reanalysis product now available -are poorly constrained during the first half of the 20th century over the Pacific . Then, the Northern Hemisphere's ring-width network shows that the proxy responses to atmospheric modes like the PNA are determined by a complex causal chain linking large-scale circulation, local climate and seasonal tree growth . Accordingly, relatively few chronologies, mostly from the Pacific Northwest and northern Rockies, significantly respond to the winter PNA . More generally, real climate proxies can be critically affected by noise (von Storch et al., 2009), and may suffer from non-stationary climate proxy relationships that are neglected in our perfect model framework (e.g., Evans et al., 2013;D'Arrigo et al., 2008). There exist, however, long winter precipitation-sensitive and possibly also temperature-sensitive proxies across the southern United States (e.g., Stahle and Cleaveland, 1994;St. George and Ault, 2014), upon which future designs of PNA pseudo-reconstruction exercises could be based. As shown in Supplement Fig. S13, the skills of an ensemble of TT2010-like PNA pseudo-reconstructions progressively deteriorate for increasing levels of noise artificially introduced in the predictors. Skills depend more on the level of noise rather than on the type of noise, at least for low amounts of noise, in accordance with von Storch et al. (2009). For a signal-to-noise ratio of 1 (Fig. S13c, f) explained variances for the validation period never reach 0.5, and red noise generally produces unskillful reconstructions. Thus, our pseudoproxy investigation is only meant as an idealized example demonstrating the potential margins of improvement offered by the reconstruction method. Its application to a real-world PNA reconstruction requires the scrutiny of available data, which we defer to a dedicated follow-up study.
Poor modeling of the PNA-related dynamics is a straightforward explanation of the early 19th-century discrepancy between the reconstruction and the simulations. Furthermore, the realism of our PNA pseudo-reconstructions relies on the realism of simulated patterns, variability and teleconnections of the PNA as well as of other hemispheric modes acting upon on the North American climate. Accurate representation of observed dominant modes of climate variability and of their teleconnections still represents a challenge for coupled climate simulations (e.g., on ENSO see Guilyardi et al., 2012;Zou et al., 2014). Unrealistic simulated representation of large-scale atmospheric circulation modes can arise due to biased ocean-atmosphere coupling over remote regions: coupled climate models are still affected by considerable biases in regional SSTs and sea ice -especially in the North Atlantic Ocean -that are associated, in the Northern Hemisphere, with cold biases resembling the Northern Hemisphere's annular mode (Wang et al., 2014). This suggests that good model performance in simulating regional processes may be overridden by the effect of remote biases.
Our definition of the PNA index does not account for possible displacements of its centers of actions in simulated patterns compared to reanalyses. An alternative definition based on empirical orthogonal functions (EOF) results in PNA indices that share between half (MIROC-ESM) and almost the whole (CCSM4) total variance with the point-wise-based PNA indices over the observational period (see Supplement  Table S1). Spatial differences between simulated EOF-based and point-wise-based patterns also vary considerably across the ensemble (Table S1). It is not yet clear whether and how these uncertainties related to the index definition affect the details of the pseudo-reconstructions. The validity of our general conclusions clearly stands for the sub-ensemble including only models with the most consistent PNA indices across the two definitions (CCSM4, IPSL-CM5A-LR, MPI-ESM-P).
The marked inter-model differences in the PNAprecipitation correlation patterns over North America and their general disagreement with the observed pattern (Fig. 3) highlight the large uncertainties in the connection between large-scale circulation and local hydroclimates that still affect state-of-the-art coupled climate simulations. In this regard, topography largely determines the wave-like structure of the PNA and its surface signature. Its dominant role was already highlighted by TT2010 in describing the characteristics of their two precipitation-sensitive tree ring series from Montana and Wyoming. Poor model topography likely leads to biases in representing the PNA pattern and more visibly its climate fingerprints. This was exemplified here by the stark contrast between the low-resolution model FGOALS-gl and the high-resolution model CCSM4 (compare panels (c) and (d) in Figs. 1-3). With few exceptions, topography in the employed models misses critical plateau elevations that are crucial for the onset and sustenance of snow/ice-related feedbacks (Berdahl and Robock, 2013). These could be relevant for the reinforcement/dampening of the Canadian High during the development phase of a positive/negative PNA (Ge and Gong, 2009). Accordingly, these latter model deficiencies can partly explain why inclusion of precipitation over the Rockies as a predictor degrades the skills of our pseudoreconstructions and yields much weaker skills than the actual TT2010 reconstruction (as discussed in Sect. 2.4).
A possible solution to the discrepancy between the reconstruction and the simulations is to attribute the reconstructed early 19th-century positive PNA phase to internal variability. Supporting this hypothesis, interdecadal persistent positive PNA phases emerge in all simulations throughout the last millennium without consistent timing (Fig. 4a). However, simulated events are generally weaker than the reconstructed event and the number of those longer than 20 years is exiguous (see Supplement Fig. S14). Therefore, notwithstanding the caveats described above about the realism of simulated PNA dynamics and variability, the reconstructed early 19th-century positive PNA phase is compatible with an exceptional event of internal climate variability. A simiwww.clim-past.net/11/939/2015/ Clim. Past, 11, 939-958, 2015 lar interpretation has been recently proposed, with the support of climate simulations, for reconstructed multidecadal droughts in southwestern North America during the last millennium (Coats et al., 2015). Further supporting this hypothesis, the simulations ensemble does not point to coherent positive PNA anomalies during other periods of the last millennium with concomitant strong volcanic forcing and weak solar forcing, e.g., the mid-15th and the late 17th centuries (Fig. 4a).
Under the alternative hypothesis that the reconstructed early 19th-century positive PNA phase is externally driven, the discrepancy between the reconstruction and the simulations can be explained by common model deficiencies in the simulated dynamical response to natural forcing and/or by uncertainty in the (reconstructed) imposed external forcing. Supporting this hypothesis, state-of-the-art coupled climate models still suffer from a deficient representation of stratospheric and coupled stratosphere-troposphere dynamics (Kodera et al., 1996;Woollings et al., 2010), which affect the simulated response to volcanic (Driscoll et al., 2012;Charlton-Perez et al., 2013;Muthers et al., 2014) and solar (Gray et al., 2010;Anet et al., 2014) forcing. Furthermore, inter-model disagreement about post-eruption oceanic evolutions (e.g., Ding et al., 2014) shows that large uncertainties still exist about decadal-scale climate variability during periods of strong volcanic forcing and the role of the ocean in determining the surface air temperature response (Canty et al., 2013). Sensitivity simulations performed with a chemistry-climate model demonstrate the importance of the Dalton Minimum of solar activity for the persistence of the hemispheric cold temperature anomalies of the early 19th century . However, the cold winter temperature anomalies depicted by these simulations over Alaska during the period 1805-1825 do not match with the imprint of a positive PNA. Single-model ensemble climate simulations have shown that the 1815 Tambora eruption produces robust large-scale atmospheric circulation anomalies, roughly corresponding to a positive PNA phase, only in the absence of additional external disturbances, whereas under full forcing conditions such positive PNA-like features become hardly distinguishable (Zanchettin et al., 2013a). The same simulation ensembles have further demonstrated that internal climate variability can be a source of uncertainty for the simulated early 19th-century decadal climate evolution as important as the (reconstructed) imposed forcing (Zanchettin et al., 2013a). Moreover, although climate simulations depict an interannual-to-decadal PNA/NPI response to strong tropical volcanic eruptions (Zanchettin et al., 2012;Wang et al., 2012), responses on longer timescales may be damped by the resilience of the interdecadal component of the Pacific Decadal Oscillation to natural external forcing (Zanchettin et al., 2013b).
Uncertainty in the external forcing factors acting on the early 19th-century climate further complicates the attribution of reconstructed and simulated variability. For instance, re-constructed variations in total solar irradiance are affected by considerable uncertainties (e.g., Schmidt et al., 2011;Shapiro et al., 2011) as well as deficiencies in accounting for the spectrum variations for solar forcing and ozone response (Gray et al., 2010). There is ongoing debate about how changes in total solar irradiance affect the tropical oceans, with different observations and different simulations disagreeing about whether warming or cooling of the upper tropical Pacific is expected under enhanced solar activity (Misios and Schmidt, 2012). Moreover, the radiative impact of tropical volcanic eruptions is sensitive to the season of the eruption (Toohey et al., 2011;Froelicher et al., 2013), and the season of the 1809 tropical eruption is still insufficiently constrained (Cole-Dai et al., 2009).

Conclusions
Our results depict a discrepancy between reconstructed and simulated PNA behavior during the early 19th century, an exceptionally cold period in the Northern Hemisphere characterized by concomitant weak solar and strong volcanic forcing. According to our pseudo-proxy investigation, reconstructions based on northwestern North American geophysical predictors are potentially skillful in terms of two different metrics (coefficient of determination and coefficient of error). Such an approach following Trouet and Taylor (2010) is also likely capable of capturing strong interdecadal positive PNA phases, like the one reconstructed for the early 19th century. However, a number of sources of uncertainty and potential deficiencies are still present especially at multidecadal and centennial timescales. Furthermore, pseudoreconstructions based solely on predictors from northwestern North America often cannot distinguish between the PNA and the North Pacific Index describing the strength of the Aleutian Low.
The PMIP3-past1000 and historical simulations provide an overall satisfactory representation of the observed PNA spatial pattern and of its imprint on the North American climate. Simulated pre-industrial PNA evolutions show a predominance of internal variability over forced signals, which could be used as an argument to explain why simulations do not robustly exhibit the reconstructed positive PNA phase in the early 19th century. Shifting focus to the attribution of the reconstructed anomaly requires confidence that simulations do not suffer from common deficiencies in the response to natural forcing, in the applied reconstructed forcing and/or in the internally generated climate variability. We need therefore to better understand the relative role of externally forced and internal climate variability during the pre-industrial period.
A refined topography associated with high horizontal model resolution appears to be important for models to realistically capture the connection between the large-scale circulation and the local climatic/environmental conditions upon which a reliable PNA reconstruction depends. However, our pseudo-reconstructions also indicate that there is margin to substantially improve the available PNA reconstruction, in particular through a more exhaustive representation of the multiple PNA-sensitive regions over North America in the predictors' set. These results call for strengthened cooperation between the climate proxy and climate modeling communities in order to improve our knowledge about the early 19th-century PNA and to solve the related reconstruction-simulation discrepancy.
The Supplement related to this article is available online at doi:10.5194/cp-11-939-2015-supplement.