Reconstructing Late Holocene North Atlantic atmospheric circulation changes using functional paleoclimate networks

Obtaining reliable reconstructions of long-term atmospheric circulation changes in the North Atlantic region presents a persistent challenge to contemporary paleoclimate research, which has been addressed by a multitude of recent studies. In order to contribute a novel methodological aspect to this active field, we apply here evolving functional network analysis, a recently developed tool for studying temporal changes of the spatial co-variability structure of the Earth’s climate system, to a set of Late Holocene paleoclimate proxy records covering the last two millenia. The emerging patterns obtained by our analysis 5 are related to long-term changes in the dominant mode of atmospheric circulation in the region, the North Atlantic Oscillation (NAO). By comparing the time-dependent inter-regional linkage structures of the obtained functional paleoclimate network representations to a recent multi-centennial NAO reconstruction, we identify co-variability between Southern Greenland, Svalbard and Fennoscandia as being indicative of a positive NAO phase, while connections from Greenland and Fennoscandia to Central Europe are more pronounced during negative NAO phases. By drawing upon this correspondence, we use some key 10 parameters of the evolving network structure to obtain a qualitative reconstruction of the NAO long-term variability over the entire Common Era (last 2000 years) using a linear regression model trained upon the existing shorter reconstruction.

exhibit opposite characteristics. While the influence of the NAO phase is strongest during boreal winter, it also affects summer conditions (Ogi et al., 2003;Folland et al., 2009).
As the NAO is a key aspect of European climate variability, long-term changes in the dominant phase of this atmospheric variability mode should also be reflected in the co-variability structure of existing paleoclimate records. To this end, there are various types of archives available that could be utilized for reconstructing such changes. Specifically, most high-resolution 5 archives in the region are sensitive tracers of inter-annual temperature variability (commonly seasonal or annual mean values) and, hence, should have been influenced to a certain degree by the NAO. For example, ice core data from Southern Greenland mostly reflect winter temperatures and, thus, strongly follow winter NAO conditions (Appenzeller et al., 1998;Vinther et al., 2010). In turn, tree ring chronologies mainly trace summer temperatures, but can be additionally influenced by winter conditions, especially in terms of extreme precipitation anomalies (see, e.g., Lindholm et al., 2001;Linderholm and Chen, 10 2005; Vaganov et al., 1999, and references therein). While these facts have been utilized to reconstruct NAO indices prior to the instrumental period (Cook et al., 1998), caution has to be taken since the corresponding relationship is non-stationary and, thus, the principle of uniformitarianism can be violated (Schmutz et al., 2000;Zorita and González-Rouco, 2002;Lehner et al., 2012). Furthermore, the actual effect may not be the same at different locations. For example, a persistent positive phase of the NAO leads to higher winter precipitation in Northern Europe, which in turn has an indirect influence on tree growth during 15 summer. At the same time, positive NAO phases have been connected to low precipitation and even droughts in Southern Europe. Hence, we expect tree ring records from Central and Southern Europe to be more strongly affected by negative NAO phases than by positive ones.
Following upon these considerations, we anticipate that different types of terrestrial archives available in different parts of the North Atlantic region have in common that they all reflect the leading mode of regional climate variability at inter-annual to 20 multi-decadal time scales in one way or another. Hence, the main idea of this study is that by exploiting the temporary presence or absence of similar variability patterns between different paleoclimate records, especially in Southern Greenland versus the rest of the study region, one can draw conclusions about changing commonalities between the main atmospheric drivers in different sub-regions and, thus, the mean state of the atmospheric circulation in the North Atlantic region.
We emphasize that the aforementioned perspective differs markedly from previous efforts to reconstruct NAO variability over 25 the past millennia. The general approach of existing studied, as used by Ortega et al. (2015); Trouet et al. (2009) and others, has been to find a reasonable set of records and use linear regression techniques, to infer NAO variability from proxy variability.
In contrast to this, the evolving functional network approach does not assume any stationarity of the relationship between each individual proxy and the NAO. In fact, we explicitly utilize the observed non-stationarity and thus offer a complementary view on the same climatological target variable to common linear regression models. This fact makes it also challenging to 30 test our proposed procedure by making use of pseudo-proxies, as the latter commonly only represent a primary signal (e.g., temperature) in a stationary manner.
The remainder of this paper is structured as follows: In Sec. 2, we describe the ensemble of paleoclimate proxy records used in this study. Section 3 presents the methods used to construct evolving functional paleoclimate networks and derive from them a scalar index variable describing the time-dependent dominant mode of North Atlantic climate variability over the Common 35 Tree Rings Historical Ice Cores Lake Sediments Detailed information on the individual data sets can be found in the Supplementary Material Tab. S1 and S2.
Era. In Sec. 4, we discuss the emerging structures and how well they reflect associated changes in the dominant NAO phase at multi-decadal to multi-centennial time scales. Our results are compared to findings from other studies in Sec. 5, including a discussion on the possibilities and possible shortcomings of our approach. The paper ends with concluding remarks in Sec. 6.

Data
The North Atlantic region comprises a large variety of well-studied high-resolution paleoclimate archives for the Late Holocene. 5 Existing data sets include several ice core records from the Greenland ice shelf and Svalbard, tree ring chronologies from the Scandinavian mountains, the Alps and other mountain ranges, and varved lake sediments, especially in Southern Finland. In addition, there exist also some very long historical temperature records based upon early instrumental records (see references in the Supplementary Material Tab. S1 and S2). Many of the available proxies are strongly correlated to seasonal or annual temperature variability and thus have been used as key input for existing regional (Werner et al., 2017a;Luterbacher et al., 10 2016;PAGES 2k Consortium, 2013) and hemispheric (Ljungqvist et al., 2012;Mann et al., 2008) temperature reconstructions.
While there are particularly many archives covering the last millennium, a considerably lower number spans the full Common Era at high resolution.
In this study, we concentrate on changes in the inter-annual to multi-decadal co-variability of temperature-sensitive proxies throughout the Common Era. Thus, we only include records into our analysis that span at least 300 years and have close to 15 annual resolution. This leaves us with 37 time series, which are described in detail in the Supplementary Material Tab. S1 and S2. They are shown in Fig. 1. Note that only 12 of these records cover the full Common Era, all of them being located in either Greenland, Fennoscandia or the Alps.
We note that the considered selection criteria exclude some existing records, which have been related to past NAO variability in previous works, but are either not temperature sensitive or not of (approximately) annual resolution (for example, the records 5 used in the recent study of Deininger et al. (2016)). Specifically, most of the records studied in this work have been derived from either tree rings, varved lake sediments or ice cores, all of which have been dated rather precisely (for estimates of the dating uncertainty of some of the selected records, see, e.g., Werner et al. (2017b)). Thus, age uncertainty is considered to be negligible in the following. Unfortunately, there is no detailed information on proxy uncertainty that is consistently available for all records. Consequently, this uncertainty will not be further addressed explicitly in this following analysis. 10

Methods
As mentioned in the introduction, functional climate network analysis has recently become an established tool for studies on climate dynamics. Following upon the success of this approach, a few initial studies have transferred the corresponding idea to the analysis of spatial co-variability patterns among paleoclimate archives in a defined region (Rehfeld et al., 2013;McRobie et al., 2015;Oster and Kelley, 2016). Beyond the original framework, we aim here at studying the statistical interdependence 15 structure between subsets of archives from different appropriately defined regions and relate the information inferred from this analysis to a macroscopic index tracing the dominating mode of inter-annual North Atlantic climate variability at multi-decadal time scales. Accordingly, the methodological approach followed in this work comprises the following four steps: 1. Identify an ensemble of paleoclimate proxy records that have been influenced by a common climate variable (in our case, temperature). 20 2. Construct evolving functional networks based upon this ensemble according to the mutual similarity between individual records. Temporal changes in the network's connectivity structure represent changes of co-variability between the studied proxies.
3. Introduce a meaningful grouping of the records to reduce the complexity and increase robustness of the obtained information in the presence of proxy uncertainty and a varying number of records. 4. Establish a statistical relationship between the characteristics of the paleoclimate network and some climate variable or index (in our case, an existing long-term NAO reconstruction).
This general workflow of functional paleoclimate network analysis is illustrated in Fig. 2. While we have already discussed the first step of this workflow in the previous section, we will highlight in the following the methodological realization of the three remaining steps. These variables are then related to an established long-term reconstruction of the NAO index via linear regression.

Functional network construction
In general, an evolving functional network represents the time-dependent statistical co-variability structure among a set of time series. In the case of a perfect climate recording, this information can be used to infer teleconnections between distant regions.
For paleoclimate archives however, one has to recall the existence of additional influencing factors represented in the recorded proxies. Thus, co-variability might not just indicate direct physical linkages, but possibly also secondary (external) effects that each window according to its respective end point. Thus, any statistical measure calculated from a subset of a time series {x i } t W corresponding to the time window of length W ending at time t is associated with the set of time indices t W := {t |0 ≤ t − t ≤ W}. The window size W determines the temporal resolution of the analysis. Thus, we obtain a sequence of networks associated with different time intervals. Within this sequence, changes in the network structure allow tracing changes in the spatio-temporal co-variability among the studied proxies through time. 5 Each individual network is described by the N r × N r adjacency matrix A, defining the connections between nodes, (1)

Similarity assessment
In general, functional networks can be constructed based upon different types of similarity measures, including classical linear approaches like the Pearson correlation, but also other measures suitable for detecting non-linear or event-based relationships, 10 like mutual information or event synchronization (see, e.g., Rehfeld and Kurths, 2014, for a comparison of possible measures).
Specifically, to determine the strength of co-variability between two paleoclimate records, a suitable similarity measure has to be able to cope with unevenly sampled and/or discontinuous time series. Here, we use a Gaussian kernel-based variant of the Pearson correlation coefficient (gXRF) (Rehfeld et al., 2011). Given two time series . (2) Here, the kernel function K(·) is given by with h = max (∆t x , ∆t y )/4, where ∆t x,y denote the mean sampling intervals of the corresponding time series. Specifically, as the same atmospheric driving variable can have qualitatively different effect on different proxies or at different locations, we 20 take the absolute value of this correlation to quantify the strength of similarity.
Since our analysis makes use of different types of paleoclimate proxies, it is advisable to define similarity in a way that takes the different characteristics of the proxies into account. Time series originating from the same type of archive record climate variability in a similar fashion and thus might intrinsically exhibit stronger mutual correlations than such from different types of archives. Specifically, long-range auto-correlations and associated low-frequency variability can lead to spurious correlations, 25 which have to be corrected for (Guez et al., 2014). To account for this problem, we apply a surrogate-based significance test to calculate p-values corresponding to the probability that two records are similar just by chance, given their inherent auto-correlation structures. For this purpose, we use 1000 amplitude-adjusted Fourier transform (AAFT) surrogates (Schreiber and Schmitz, 2000), which leave the auto-correlation structure of each time series intact. Note that among the considered set of archives, all but four proxy records are complete and actually evenly distributed at annual time scale (one having lower 30 sampling resolution and the three others containing gaps). In this regard, using the Gaussian kernel correlation (developed for unevenly sampled data) instead of classical Pearson correlation coefficient accounts for these four records with different properties, while AAFT surrogates can still be generated with standard procedures (using linear interpolation for the very few missing data) for all records. The estimated p-values resulting from the surrogate ensembles provide a generally applicable measure of similarity, and two proxies are considered to be similar, if their p-value (rather than the estimated correlation value 5 itself) is below a defined threshold value α pr .

Network analysis
In the case of paleoclimate time series, there are some particular complications to be addressed in the context of functional network analysis. First, many records do not cover the full time span under study. Thus, the effective number of nodes N r t varies with time (see Supplementary Material Fig. S1). Second, while different archives might be significantly affected by the same 10 climate variable, they still exhibit both local and proxy-specific effects, so that the outcomes of pair-wise similarity assessments can be highly case-specific, even though the shared climatic influence might be the same. Furthermore, paleoclimate networks often already exhibit too many nodes to allow for an intuitive climatic interpretation (in particular, of individual links between pairs of proxies), but are at the same time too small to apply more sophisticated methods from complex network theory.
In order to address these peculiarities and to simplify the interpretation of the resulting network structures, it appears reason-15 able to combine spatially close records into clusters (as will be further detailed below) to yield smaller networks with fewer, but weighted connections. In the present context, a cluster is a subset of records denotes the cardinality, the number of members of a set). Note that we will consider the assignment of any archive to a specific cluster as being fixed over the entire analysis period (see below). Hence, the existence and size of a given cluster size vary only due to the (non-) availability of the given archives during different periods of time. 20 Having obtained a climatologically meaningful grouping of our archives into spatially connected clusters, we define the cross-link density (CLD) between any two clusters C K t W and C L t W as Note that as we are generally considering evolving (i.e. time-dependent) network structures, the CLD for each pair of clusters 25 will commonly vary in time. However, the CLD values are expected to be more robust tracers of the essential network structure than other commonly used network characteristics, since they combine information from various links and are properly normalized by the (time-dependent) number of records. If the paleoclimate archives were perfect recorders of the same climate variable these simplified networks would then provide a coarse representation of the teleconnectivity structure between larger regions. In the case of real-world paleoclimate records, the attribution of any specific physical process to a link between clusters This way, we effectively obtain coarse-grained networks with fewer nodes (associated with each cluster) and weighted links (CLD values). In turn, we disregard any information on the intra-cluster statistical linkages between individual archives, since we expect the latter to mainly reflect the intrinsic spatial correlation length of the influencing climate variable. Combining information on intra-and inter-cluster linkages would result in a paleoclimate "network of networks" approach; a framework that has already been employed in a few studies on recent climate variability (Donges et al., 2011;Wiedermann et al., 2016), 5 but might suffer from the low number of nodes in a paleoclimate setting.

Spatial clustering of proxies
As discussed above, it is advisable to simplify the functional paleoclimate network by grouping several archives into spatially connected and climatologically meaningful clusters and study exclusively the temporal changes in the mutual similarity of proxies at the resulting cluster level. Specifically, the obtained clustering should meet the conditions that clusters (i) com-10 prise spatially close archives and (ii) are large enough to reduce the impact of individual records and thus lead to a robust representation of the large-scale spatial co-variability structure.
Given the distinct individual characteristics of the different proxy time series and local as well as archive-specific effects, it is difficult to perform a cluster analysis directly at the set of time series originating from the archives, since such a procedure would likely result in a highly fragmented cluster structure. Instead, given that all proxies in our ensemble are temperature- 15 sensitive, we define clusters as regions which have shown similar inter-annual temperature variability over the modern (instrumental) period. For this purpose, we make use of the gridded ERA-20C reanalysis summer temperature data spanning the whole 20 th century (Poli et al., 2016). Based upon this data set, we generate a functional climate network reflecting only the strongest absolute linear (Pearson) correlations (as determined by a threshold value α C ) among the time series of seasonal mean (annually averaged boreal summer, JJA) temperatures for each grid point over land in the study region. From this net-20 work representation, we identify subsets of grid points with high intrinsic and low extrinsic connectivity (referred to as network communities) by applying the so-called Louvain algorithm (for details, see Blondel et al. (2008)).
We emphasize that the described spatial clustering procedure introduces an additional parameter α C into the analysis. In general, we observe that small values of α C yield more but smaller clusters, while larger values lead to a lower number of larger clusters (not shown). One of the main differences between the obtained clusters using different values of α C (from 25 a reasonable range of values) is the division of Greenland and the border between Central and Eastern Europe. This is also the main difference in using different variables (e.g. ERA-20C winter mean temperature) for the clustering (Supplementary Material Fig. S2).

Statistical modeling by regression
Beyond simple visual analysis of the evolving network structures, we aim to statistically link the obtained time-dependent CLD values with a climate-related variable (in our case, an existing NAO reconstruction by Ortega et al. (2015), in the following referred to as NAO Ortega ) that reflects a common influence on the co-variability between different regions. The simplest model to establish such a relationship between X t W and some variable Y would be a linear model with a coefficient vector D and a noise process t W . Such a model implies, that the connectivity between different regions is linearly related to the strength and the sign of the NAO. Note that while in general the CLD values should rather be described as a superposition of different climatic influences, we take here the opposite approach, which is potentially useful for obtaining a reconstruction of the (unknown) climate driver based upon our evolving functional paleoclimate network properties. For a 10 detailed discussion on different (regular vs. inverse) versions of this regression problem and their implications in the context of paleoclimate reconstructions, we refer to Christiansen and Ljungqvist (2017); Christiansen (2014) and references therein.
We emphasize that the analysis procedure described above has two free parameters, the threshold values α pr (for generating the paleoclimate network) and α C (for obtaining the spatial clustering). Given that our general aim is to maximize the inferred information about the mean state of the leading mode of North Atlantic climate variability at inter-annual to multi-decadal

25
As a final step, we further investigate the linear model (Eq. 5) for NAO Ortega,50 yr using a Bayesian approach, Markov Chain Monte Carlo (MCMC) regression (Gilks et al., 1995). Unlike OLS regression, this method results not in individual estimates of the different regression coefficients, but in joint distributions for all model parameters. Thereby, we implicitly account for the uncertainty in the description of the target variable. Since some of the considered clusters of paleoclimate archives do not cover the full Common Era, we can furthermore use the parameter distributions of the full set of clusters as priors to find the new 30 distributions of the reduced set of CLD values, thus utilizing the knowledge of the full data for cases of lower data availability.
For performing the MCMC regression, we employ a NUTS sampler (Hoffman and Gelman, 2014) with 10 4 samples with one quarter of these as burn-in.

Results
We have followed the procedure described in Sec. 3 to study the evolving paleoclimate networks derived from the set of 37 paleoclimate records described in Sec. 2. The window size W of our analysis has been selected as 50 years, consistent with the averaging window in NAO Ortega,50 yr . We have considered sliding windows with a mutual offset of 1 year, implying an overlap of 49 years between subsequent windows. The threshold values α pr and α C have been determined by maximizing the explained  During the first millennium CE (of the Common Era), we can distinguish two common, qualitatively different states of the network, one being dominated by connections between FS and the other two clusters (Fig. 4a,c) and another exhibiting strong correlations between archives from SG and CEU (Fig. 4b,d). During the second millennium CE (with considerably more archives available), we more clearly identify periods during which mainly West-East connections between Greenland 15 (G), Svalbard (S) and FS are present (Fig. 4f,i). During other times, North-South connections involving CEU are more strongly expressed (Fig. 4 e,g,h). The latter is commonly the case during time intervals for which NAO Ortega,50 yr indicates a negative mean reconstructed NAO index, while cross-cluster links are more concentrated within the northern North Atlantic sector during positive NAO phases.
To this end, the time intervals presented in Fig. 4  the obtained sign of the NAO phase is identified correctly in 68% and 71% of the considered time windows covered by NAO Ortega,50 yr , respectively. In the following, we will refer to this quantity as the true sign ratio (TSR). Notably, taking the second (more recent) half of NAO Ortega,50 yr as regression period (which corresponds to a period with more records than the first one) results in higher values of both, r 2 and TSR. This finding suggests, that using additional records, which do not have data outside of the regression period, can still lead to a better performance of our model due to a better interpretation of the 25 existing links. Future work along the lines of the present paper might explicitly utilize this observation in a Bayesian analysis framework.
While the observed TSR provided by our model is markedly larger than that of a random guess (TSR ≈ 50%), it is still rather low in comparison with common requirements if using such models for predictive purposes. To better understand the ∼ 30% of time intervals during which the sign of NAO Ortega,50 yr is not correctly represented by our model better, we perform an additional 30 type of cross-validation, this time independently leaving out consecutive 50-year windows as validation periods and taking the rest of the data for model estimation. The results of this analysis are shown in Supplementary Material Fig. S5. We find, that whenever the TSR in a given validation period is clearly lower than the mean value of 0.69, NAO Ortega,50 yr is either consistently close to zero or exhibits a transition between positive and negative NAO phases. Nevertheless, there are some periods during which our model differs markedly from NAO Ortega,50 yr , e.g. in the 17 th century. Hence, we conclude that if our model is used 35 for the purpose of hindcasting the (themselves statistically reconstructed) NAO values according to Ortega et al. (2015), this might result in incorrect identifications of the mean NAO phase at values where NAO Ortega,50 yr is close to zero, since under these conditions, our ensemble of equally likely NAO "trajectories" obtained from MCMC regression includes both, positive and negative estimates for the corresponding time window. In turn, the model performs well in correctly identifying strong and persistent positive and negative NAO phases. However, we observe that the actual timing of transitions between distinct NAO 5 phases can differ between NAO Ortega,50 yr and our model (Supplementary Material Fig. S5). Thus, the relatively low predictive skill of our reconstruction method might in part be due to a possibly lagged representation of NAO shifts in the connectivity between regional clusters of proxies. Further possible reasons and suggestions for improvements are briefly mentioned in Sec. 6.
Despite the fact that only four geographical clusters of paleoclimate archives cover the full Common Era, our model allows Finally, we have additionally tested the robustness of the estimated regression model by varying α C over a reasonable range (similar variations of α pr were found not to alter the obtained results markedly, which is not explicitly shown here).
Supplementary Material Fig. S6 shows the corresponding results in terms of OLS-based regression models obtained with different parameter values. While most parameter sets close to the selected optimal one yield very similar results, there are 25 few exceptions demonstrating the importance of this sensitivity analysis. A particularly remarkable example is found in the second half of the 5 th century, where a transition from a predominantly negative NAO phase to a positive one is observed, the exact timing of which, however, differs significantly among the different regression models. This observation underlines, that our model has some uncertainty -not only in terms of reconstructed values, but also the timing of changes -which cannot be accounted for outside an ensemble-based approach. 30

Climatological interpretation
In our analysis, we have related evolving functional paleoclimate networks to the dominant mode of multi-decadal variability of the atmospheric circulation in the North Atlantic region as reported by Ortega et al. (2015), which has been associated with long-term changes of the NAO. As seen from Figs More detailed statements can be made based upon a systematic inter-comparison between the time-dependent CLDs associated 10 with distinct cluster pairs, as well as the evaluation of the regression to the existing reconstruction NAO Ortega,50 yr in terms of the linear model (Eq. 5). The latter analysis helps relating different NAO phases to certain cross-cluster links in a more objective fashion. negative correlations with that reconstruction throughout the last millennium, there is much more variability in the correlations with NAO Ortega for all the other records. More important than variability in magnitude is the fact, that a linear relationship is absent for most records at most times. In turn, they are co-varying with the NAO at certain times, but are unaffected at others.
Suppose that we are given a reference time series which exhibits a stationary relationship with the "true" NAO (in our case, the aforementioned Greenland ice cores). If the variability of any particular record shows a strong similarity with this reference 25 series during a specific time, we expect that this record carries significant information about the NAO phase. As the actual imprint of the NAO is different for different regions, this information is present in different proxy groups at different times. In our case, the Greenland ice cores act as a filter to indicate which regions are strongly influenced by the NAO and thus which NAO phase is more probable.
The median values of the regression model correlate well with NAO Ortega,50 yr (squared Pearson correlation of r 2 = 0.58). 30 However, this value could simply result from overfitting, as indicated by our cross-validation. While the obtained quantitative values of the reconstructed NAO index of Ortega et al. (2015) are thus not reliably described by our model, the latter performs well in resolving the dominant NAO phase. Drawing upon the knowledge about relationships between certain cross-cluster links and the NAO phase as discussed above, it is generally possible to extend the existing (smoothed) NAO index reconstruc-tion NAO Ortega,50 yr to the entire first millennium. However, since our linear model is not capable of describing the amplitude variability of NAO Ortega,50 yr adequately, this should only be considered as qualitative information about the likely NAO phase.
While there is an influence of the NAO on regional temperatures at multi-decadal time scales, strong low-frequency temperature variations could be associated with both, multiple modes of internal variability as well as external factors like solar activity changes and explosive volcanism (Crowley, 2000), to different degrees that are subject of ongoing studies (see, e.g., Schurer there might be common causes for such apparently contradictory observations. For example, explosive volcanism has been discussed as a major driver of the Late Antique Little Ice Age climate (Büntgen et al., 2016), but is also known to frequently 10 trigger positive NAO-like atmospheric dynamics during the years following strong eruptions (Robock, 2000). Even though this is mostly a short-term effect, a high frequency of strong eruptions, as present during the Late Antique (Sigl et al., 2015), might have had a more persistent influence.
We emphasize that there are certain limitations to the usage of CLDs and our linear model to draw conclusions about the predominant NAO phase. First, most of the CLDs exhibit downward trends and progressively decreasing variance throughout 15 the Common Era (Supplementary Material Fig. S4), which is probably related to the lower number of records as one goes back in time. This might add a considerable bias to any application of our methodological framework extending further back in time than the last millennium. In our case, this effect might favor positive NAO phases, since the regression coefficient for the connection between Southeast Greenland and Fennoscandia is by far the largest among all coefficient. Furthermore, our regression is based on a proxy-based reconstruction, which contains large uncertainties itself and therefore has limited 20 value as a "ground truth". Ortega et al. (2015) reported, that their reconstruction explains only about 40% of the variance of the observed NAO. In turn, the explanatory value of the time series (NAO Ortega,50 yr ) upon which our regression analysis is based, is a key assumption beyond our procedure. Moreover, our cross-validation showed that the linear model can disagree with NAO Ortega,50 yr especially in cases in which that reconstruction has values close to zero, as well as in the timings of some transitions between positive and negative NAO phases. This intrinsic uncertainty of our qualitative reconstruction has been 25 addressed by using MCMC regression to take a probabilistic view on the NAO phase. At time periods where the reconstructed NAO Ortega,50 yr is close to zero, no particular phase is preferred in general (Fig. 6).
Following upon the considerable uncertainties in using our linear model to obtain qualitative estimates of the NAO phase during the entire Common Era, we will next compare our corresponding results to other long-term NAO-related climate reconstructions. Moreover, we will utilize further independent information in terms of documented drought periods for an indepen-30 dent validation of our reconstruction. Finally, we will discuss how long-term NAO variability might have affected European societies during the first millennium CE.

Comparison with other NAO reconstructions
Our model is able to reproduce most features of the NAO Ortega,50 yr reconstruction, including a dominant positive NAO phase during the late MCA, generally stronger variability during the LIA, with a tendency towards a more negative NAO phase, and another strongly positive phase during the 20 th century, which is also in accordance with instrumental records (Vinther et al., 2003). 5 In contrast to other previous findings by Trouet et al. (2009), we do not observe strong West-East correlations during the early MCA, which suggests that this time interval has not been characterized by a strongly positive NAO.
Another recent NAO reconstruction by Olsen et al. (2012) used a more than 5000 years-long lake record from Southern Greenland to trace the predominant NAO phase during the entire Late Holocene. Our mean model correlates only extremely weakly with their reconstruction during the common period (r 2 = 0.04). One probable reason for this disagreement could

Comparison with historical droughts
Winter NAO and the corresponding precipitation anomalies exhibit known linkages to droughts in many parts of Europe, most significantly in the Western Mediterranean region (López-Moreno and Vicente-Serrano, 2008;Cook et al., 2016). Due to their severe impacts on agricultural productivity, droughts are some of the best documented weather extremes across historical times. Thus, existing reports of historical drought periods can be used as an independent source of information to test the historical evidence from Muslim sources for Southern Spain, a region exceptionally vulnerable to NAO-related droughts (Cook et al., 2016), from 711 to 1010 CE. They identified three major drought periods during this time. In addition, Cook et al. (2016) 30 discussed droughts during the last millennium in the Mediterranean region based upon the Old World Drought Atlas (OWDA) (Cook et al., 2015). They reported that the drought index constructed for the Western Mediterranean correlates well with NAO Ortega . Therefore, we consider here only the five strongest drought events as discussed in Fig. 5 of their paper.
It has to be noted that although droughts are strongly related to precipitation deficits potentially associated with the dominant NAO phase, they are complex phenomena with multiple causing factors. Thus, we do not expect the timing of all droughts in the considered region to be fully explained by any particular NAO reconstruction.
In Fig. 6, the major drought periods discussed in the aforementioned publications are marked as gray bars. Among the 20 drought events, there is a clear tendency towards a positive NAO (P(NAO+) > 0.5) (17 cases, with 12 of these showing P(NAO+) 5 > 0.66). Thus, most droughts indeed coincide with positive NAO phases. However, a larger number of reported droughts during a specific time period does not necessary imply a higher frequency of droughts. In case of historical documents, an increase in reported events could also indicate that a society was more vulnerable to the impacts of droughts and, thus, found them more worth reporting.

10
In this study, we have demonstrated that functional networks based on paleoclimate proxy records from multiple, spatially distributed archives offer great potentials for identifying spatial patterns of atmospheric circulation in the European North Atlantic sector together with information on their associated long-term variability.
Specifically, we have obtained a new 2000 year-long qualitative reconstruction of the leading mode of regional inter-annual temperature variability probably associated with multi-decadal NAO variability. By combining visual inspection of changing approach cannot yet be directly applied to the instrumental record as regression target. Since the longest instrumental record of the NAO goes back to 1821 only (Vinther et al., 2003), there are not enough independent data points available for proper calibration and validation when using multi-decadal time windows.
Our qualitative expansion of the NAO Ortega,50 yr reconstruction demonstrates, that the Common Era has been characterized by time periods with different behavior of the NAO. In general, multi-decadal changes of the predominant NAO phase occurred 5 relatively frequent during Roman and early medieval times, while there have been other periods characterized by a persistent positive or negative NAO phase. The first is the case for most of the migration period, the late medieval times and the 20 th century, while the latter is found during the late Roman times and the Little Ice Age. These long-term changes in the NAO phase might have had a considerable impact on European societies, as the NAO phase is associated with the likelihood of regional droughts as well as precipitation and temperature extremes, which may have directly affected agricultural productivity. 10 In this spirit, specific phases might have supported some European societies while negatively affecting others.

Code availability
All calculations in this work have been based upon open source software. AAFT surrogates have been generated using the Python package pyunicorn (Donges et al., 2015). Cluster analysis of reanalysis data has been conducted using the community package. MCMC regression has been performed with the pyMC3 package (Salvatier et al., 2016).  . Regression coefficients between the 15 cross-link densities (CLD) among the spatial clusters of records and NAOOrtega,50 yr. Note that these linkages represent statistical relations and do not necessarily relate to (temperature) teleconnections between different regions, but may also reflect common factors influencing the respective regional climate dynamics as recorded by the proxies in a similar way. Black circles mark the centers of each group of records. Each line indicates the correlation between the CLD for two groups of records and the NAO reconstruction by Ortega et al. (2015). Red ( less than 66% of all considered MCMC ensemble members agree upon the sign of the reconstructed NAO variable. Gray bars correspond to known major drought episodes in the Western Mediterranean as discussed in Sec. 5. For better comparison, the NAO reconstruction by Ortega et al. (2015) is shown as a black line, indicating a general agreement with our probabilistic reconstruction over the common period as expected.