Technical Note: The analogue method for millennial-scale, spatiotemporal climate reconstructions

. Inferences about climate states and climate variability of the Holocene and the deglaciation rely on sparse paleo-observational proxy data. Combining these sparse proxies with output from climate simulations is a means for increasing the understanding of the climate throughout the last ~21 millennia. The analogue method is one approach to do this. The method takes a number of sparse proxy records and then searches within a pool of more complete information (e.g., model simulations) for analogues according to a similarity criterion. The analogue method is non-linear and allows considering 5 the spatial covariance among proxy records. Beyond the last two millennia, we have to rely on proxies that are not only sparse in space but also irregular in time and with considerably uncertain dating. This poses additional challenges for the analogue method, which have seldom been addressed previously. The method has to address the uncertainty of the proxy-inferred variables as well as the uncertain dating. It has to cope with the irregular and non-synchronous sampling of different proxies. 10 Here, we propose a speciﬁc way of dealing with these obstacles. We use uncertainty ellipses for tuples of individual proxy values and dates and, thereby, consider the dating as well as the data uncertainty. Results highlight the potential of the method to reconstruct the climate of the last ~15 millennia. However, in the present case, the reconstructions show little variability of their central estimates but large uncertainty ranges. The reconstruction by analogue provides not only a regional average record but also allows assessing the climate state compliant with the used proxy predictors. These ﬁelds reveal that uncertainty 15 are also large locally. Our results emphasize the ambiguity of reconstructions from spatially sparse and temporally uncertain, irregularly sampled proxies. aims to understand past spatial and temporal climate variability, preferentially using a dynamical understanding of the climate processes. To achieve this, we need spatial and temporal information about past climate states and past climate evolutions. Our understanding of the past, however, relies on spatially and temporally sparse paleo-information. Data assimilation methods and data-science approaches are ways to provide estimates for the gaps in time and space. One simple approach is the analogue method or so called proxy surrogate

reconstructions (Gómez-Navarro et al., 2017;Jensen et al., 2018). The present manuscript discusses one implementation of the analogue method for reconstructing surface temperature over timescales including the Holocene and the last deglaciation.
If we want to use the analogue method beyond approximately the last two millennia, we have to tackle challenges, which usually can be evaded for the Common Era. For example, our proxy records are not only spatially sparse but they also have a coarse temporal resolution on these timescales. Furthermore, the sampling generally is irregular for each individual proxy. 5 Indeed, sample dates differ between predictors on these timescales, and sample ages, i.e. dates, are uncertain. Recently, Jensen et al. (2018) use the approach to reconstruct the climate at the Marine Isotope Stage 3 (MIS3, 24,000 to 59,000 years before present (24-59kyr BP)) addressing such challenges. Including part of a deglacial period, as we do here, complicates applications further as we consider a climate trajectory with strong trends.
The basic idea of the analogue method is simple. An analogy tries to explain an item based on the item's resemblance or 10 equivalence to something else. In the analogue method, one uses a set of sparse predictors and searches for analogues for them in a pool of more complete candidates. In paleoclimatology, the predictors can be local proxy records and the candidate analogues can be fields from climate model simulations. One assesses the similarity of the simulation output and the proxy records at the proxy locations to find valid analogues.
It is important to note that comparable approaches suffer from a trade-off between accuracy and reliability of reconstructions 15 as shown by Annan and Hargreaves (2012) for a particle filter method. This depends on quality and quantity of the available proxy records. This drawback also affects the analogue method as shown by Franke et al. (2010) and Gómez-Navarro et al. (2015), who find that the skill accumulates at the predictor locations of the proxy records. Similarly, Talento et al. (2019) highlight that the analogue method may perform badly in regions with little proxy coverage.
Most paleoclimate applications of the analogue method focussed on the Common Era of the last 2,000 years (e.g., Franke 20 et al., 2010;Trouet et al., 2009;Gómez-Navarro et al., 2015Talento et al., 2019;Neukom et al., 2019). In this context, Graham et al. (2007) call the results of the analog method a "proxy surrogate reconstruction". Gómez-Navarro et al. (2017) provide a comparison of the analogue approach to more complex common data assimilation-techniques. Applications often only consider the single best analogue, which may not necessarily be appropriate especially for predictors under uncertainty.
Paleo-applications of the analogue method generally try to upscale the local proxy information but the analogue method was 25 also applied for downscaling of large-scale information (e.g., Zorita and von Storch, 1999).
Here, we describe another approach to obtain reconstructions by analogue over millennial timescales with spatially and temporally sparse and uncertain proxies. It differs from the approach so far applied to shorter and more recent periods. Our approach tries to explicitly consider not only age uncertainties (compare with Jensen et al., 2018) but also the uncertainties of the proxy values or of the temperature reconstructions inferred from these proxies. We make specific assumptions on the Beyond the mentioned challenges for analogue reconstructions on millennial timescales, the method strongly relies on the pool of available analogue fields. van den Dool (1994) considers how likely it is to observe two atmospheric flows over the northern Hemisphere that resemble each other within the observational uncertainty at the time of that study. The study finds that a pool would have to include a nonillion of potential analogues to achieve this. Obviously, we aim for less accuracy in paleoclimatology due to larger uncertainties. However, there are still only few climate simulations for relevant timescales, 5 and these simulations also only cover parts of the time periods of interest. Furthermore, these simulations stem from different climate models whose reliability on these timescales may not have been shown yet (Weitzel et al., 2018;Kageyama et al., 2018).
The next section first summarizes again the main characteristics of analogue searches for paleo-reconstructions. Afterwards, we present our way of dealing with uncertain tuples of data and date, that is with uncertain paleo-observations with uncertain 10 dating. We also describe how we consider the fact that different proxies are sampled at different times. The section also presents our selection of a simulation pool. We present results for a pseudoproxy setup (compare Smerdon, 2012) over the last~22 millennia and a realistic setup over the last~15 millennia for the European-North-Atlantic sector. Our appendix shows results for alternative proxy setups. Finally, we discuss our assumptions and results. Thereby, we aim to emphasize the opportunities of the method while also highlighting its challenges. 15 2 Analog method, assumptions, and data

General Method
In an analogue search one tries to complement incomplete information from one dataset by data from other more complete datasets. One ranks the more complete data by its similarity to the incomplete information in the first data set. In paleoclimatology this usually means that one uses a set of spatially sparse proxy records and wants to find fields from simulations or 20 reanalyses that are most analogous to the proxy records at their locations. The pool of candidate fields depends on the available simulations and reanalyses.
If, for example, one uses proxies for temperature, such a ranking may simply provide the model output temperature field that has the smallest Euclidean distance to the sparse proxy information at their locations. Alternatively, one can consider not just one but a small number of good fits with small distances (Franke et al., 2010;Gómez-Navarro et al., 2015Talento et al., 25 2019). However, it is also possible to define a range of tolerable deviations from the proxy predictor values and consider all candidates within this range (compare . Matulla et al. (2008) discuss the effect of the choice of distance measures for a downscaling exercise.
To our knowledge, only Jensen et al. (2018) and Neukom et al. (2019) consider the uncertainty of the final reconstruction among earlier paleo-applications of the analogue method, and only Jensen et al. use proxies with prominent age uncertainties in 30 their work on MIS3. They perform multiple reconstructions to obtain reconstruction uncertainties by shifting the dates of their proxies within the age uncertainties. Uncertainty information is particularly relevant for applications like the one of Jensen et al. where one has to deal with predictors that are sparse, irregular, and uncertain in time.

Present application of the analogue method
We use spatially and temporally sparse as well as data and time uncertain proxies for analogue searches on millennial timescales. Next, we detail our simplifying assumptions about what the data represents, its uncertainties, and the dating uncer-5 tainties. We also describe how we choose the dates for which we perform the reconstruction.

Variable of interest
Our interest is in temperature. We consider proxies that have a temperature calibration and search for analogues within fields of simulated surface temperature. To do the comparison, we consider the model variable "surface temperature" over the European-North-Atlantic domain shown in Figure 1. The reconstruction also uses these fields. 10 Theoretically, the variable or variables to be reconstructed can be different from the variable or multiple variables represented by the paleo-observational predictors. Indeed, we here assume that it is possible to reconstruct annual temperatures from proxy records with diverse seasonal attributions.

Data handling and use of model simulation pool
Section 2.3.1 gives details on our selected proxies. In short, we choose 17 proxies at locations in the European-North Atlantic sector (Figure 1b) from the compilation of Marcott et al. (2013). These are from a variety of different proxy systems. We take these as published by either Marcott et al. (2013) or the original publications. Therefore, calibrations and uncertainty estimates have diverse origins. Considering the proxy ages, we use these as published.

5
Optimally one would choose consistent proxy parameters, a consistent recalibration, and a consistent calibration target.
Consistency among parameters and calibration ensures a relation among the proxy predictors, which, one can assume, increases the chance that the proxy records lead to a selection of meaningful analogues. In this case, the proxies can effectively anchor the analogue procedure. We here assume that all chosen proxy-types reliably represent the target of interest and a multi-proxy approach is viable. 10 The analogue method allows searching for analogues at dates when there is information. One can pool the predictor dates into consistent intervals of, for example, 500 years, and search for analogues for these 500-year pools. One can follow the example of Jensen et al. (2018) and interpolate the proxy records to consistent time steps using the age models for the individual records.
We choose a different approach; we identify all the years for which one of the chosen proxy records includes a dated value. We perform analogue searches for these dates according to our considerations on uncertainty, which we describe in the following 15 subsection.
Each data point of a proxy series potentially represents a time-interval of a specific length and the comparison should consider this temporal resolution. That is, if one data point represents a 50-year accumulation and one data point represents a 500-year accumulation, the procedure ideally accounts for these differences. We decide to use typical resolutions instead of individual resolution estimates to simplify the procedure. Considering the accumulation estimates for the proxies from Marcott 20 et al. (2013), we conclude that the average resolution of the proxies is at best centennial. Therefore, we decide to compare the proxy estimates to 101-year averages of the model simulation output. That is, we compare them to 101-year mean values, which we obtain by using a 101-year moving mean on the simulation output time series that is closest to the proxy location. Additionally, in one test case, we do not preprocess the simulation output but use the annually resolved values of the output for the comparison. For this specific test, we also include the simulation data from 25 the QUEST-project (compare Smith and Gregory, 2012). The latter simulation used accelerated forcings and the output data is only available as 10-year snapshots. We test our approach by using pseudoproxies. We calculate the pseudoproxies following the procedure of  more specifically their ensemble approach). This approach takes simulated grid-point data and transforms it in multiple steps into a pseudoproxy record. The steps follow the framework of a proxy system model including a sensor model, an archive 30 model, and an observation model (see Evans et al., 2013).  first add a noise estimate for environmental non-temperature influences at the sensor stage. This stage also includes adding a bias term due to changing insolation. Next, the archive stage primarily represents a smoothing of this record, which is meant to reflect effects of, e.g., bioturbation. The measurement stage adds another noise term. After sampling this record at a specific number of dates, the procedure, finally, 5 https://doi.org /10.5194/cp-2019-170 Preprint. Discussion started: 16 January 2020 c Author(s) 2020. CC BY 4.0 License. also adds an error term reflecting effective dating uncertainties. Because we separate dating and data uncertainty, we modify the script of  to set the effective dating uncertainty error to zero. Figure 1a shows the 28 pseudoproxy locations.
The pseudoproxy generation smooths each record to mimic filtering effects of the environmental archive. The smoothing length is randomly chosen but temporally uniform for each record. The search for analogues again uses 101-year mean estimates from the simulation pool (compare the previous paragraph). 5 Simulations or simulation projects potentially differ in their modern day climate mean (compare, e.g., Zanchettin et al., 2014). Using anomalies circumvents this issue. One can consider simulation output as anomalies to the climatology over the 20th century or over the full simulation period or over the longest period common to all simulations. For example, Jensen et al.
(2018) take anomalies from any proxy or simulation record relative to the temporal mean over the full period of this record.
Their period of interest backs this decision. The proxy records of Jensen et al. (2018) suggest an overall rather stable climate in 10 the North Atlantic during Marine Isotope Stage 3 although a number of Dansgaard-Oeschger (DO) occurred during this period.
We presume that using anomalies allows for a wider range of simulations and analogue candidates for each date.
In the present case, the period of interest includes the last 15kyr for the real proxies and even the last 22kyr for the pseudoproxy test. Thus, it spans part of or even the full deglaciation from the Last Glacial Maximum to the Holocene optimum.
Our selection of simulations can only piecewise cover the period of interest, which complicates the construction of a surface 15 temperature candidate pool. Indeed, the most recent dates differ among the proxy records, and, thus, there is not a simple procedure to provide anomalies relative to a consistent modern climate. Additionally, using anomalies may introduce climatic inconsistencies if we are interested in climate variables other than temperature. For these reasons, we decide that we cannot reasonably use anomalies. Instead, we try to find analogues for the proxies in their original temperature units without subtracting any climatology.

Proxy uncertainty
We are interested in millennial timescales from the last glacial maximum through the deglaciation until the recent past. On these timescales, uncertainty affects our proxy predictors twofold. First, we have to consider the age or dating uncertainty. Second, the measured proxy data and the temperatures inferred from them are affected by various sources of uncertainty (compare, e.g., Dolman and Laepple, 2018;Reschke et al., 2018;Jensen et al., 2018, and their references).

25
To our knowledge, previous applications of the analogue method usually did not consider proxies with considerable age uncertainties except for Jensen et al. (2018). Jensen et al. consider the age uncertainty by shifting the date of each proxy by ± 500 years. Thereby, they obtain 2 14 reconstructions from which they calculate confidence intervals for their reconstruction.
They do not separately consider the uncertainty of the proxy/reconstruction value. For details, see Jensen et al. (2018).
We choose a different approach (Figure 2). We interpret each data point in a proxy series together with its dating as a data 30 point in the two dimensions temperature and time. The uncertainties of the data point allow to construct ellipses of confidence around each data point, and these ellipses will provide the maximal acceptable distance for a simulated data to be considered an analogue (Figure 2b). dates and uncertainty ellipses for these dates using data uncertainty and age/date uncertainty.
We have to assume a certain confidence value for the ellipse construction. The envelopes as shown in Figure 2b are pointwise uncertainties for those data points for which a published record provides ages. The uncertainty envelope for a date and for the full proxy record follows from the superposition of the uncertainties from successive data points (see panels of Figure  . This envelope provides for each date an upper and a lower limit of potential values or, potentially, results in lack of defined values for certain locations or even all locations for certain dates.

5
That is, while a two-dimensional uncertainty ellipse represents uncertainty levels for two-dimensional normal distributed data, our interest is only in the ellipse as a decision criterion to consider the data included in the ellipse and to neglect the part of the distribution outside of the ellipse. Effectively, we treat the data as if they are uniform distributed inside of the ellipse for one two-dimensional data point or inside the envelope of multiple overlapping data points. The ellipses around the data points, therefore, mark the limit of their 2-dimensional area of influence in our search for spatially resolved analogues: those simulated 10 data that fall within the ellipse are equally considered as analogues and those that fall outside the ellipse are not considered at all. For simplicity, we do not take account of the likely correlations between subsequent tuples of data and date. We also do not consider potential non-zero covariances between dating uncertainty and proxy uncertainty.
Because we provide such pointwise estimates and envelopes for the climate of the period of interest, it can happen that the method fails to find any valid analogues for given years. The pointwise estimates are compliant with the initial uncertainty estimate measures the uncertainty of the initial reconstruction relative to shifted ages. That is, the two different applications of the analogue method consider different things in their uncertainty estimates. The reconstruction uncertainty in our approach originates from the selected analogues. In our approach, if uncertainties for multiple data points in a record overlap for a given year, we simply take their maximal ranges.

5
The ellipses of uncertainty allow in theory to produce reconstructions for each year included in the dating uncertainty. That is, if a proxy series has a value dated to the year 500 BP and a dating uncertainty of σ = 50yr, and if we decide to consider dates within ±2σ then we can consider the proxy record for the search for analogues from 600 to 400 BP. However, we decide to only reconstruct values at those dates at which at least one proxy is dated. That is, if only this hypothetical proxy has a dated value between 600 and 400 BP and it only has this one dated value, we perform the reconstruction only for the year 500 BP. 10 Our assumption is that this maximises the link between the reconstruction and the underlying proxies. Thus, if we increase the width of the uncertainty envelope, we usually do not obtain reconstructed values at more dates but only increase the probability to find a valid analogue at a certain date.
The choice of a valid analogue usually relies on a distance metric. This is commonly the Euclidean distance (compare Based on such a distance, one can select the best fit, a small number of good fits, e.g., the ten analogues with the smallest distance, or a composite or interpolation of a small number of good fits. Here, we deviate from this and decide neither on a fixed number of analogues nor on a defined metric. Candidates in our pool are valid analogues if they are within a certain uncertainty interval around the proxy predictors (compare section 2.2.3). 20 That is, as described above, we have an envelope of credible values for certain years and each proxy record. A candidate is a valid analogue for a date if all proxies defined at this date include the candidate values in their credible interval. Credible intervals are 90% for the pseudoproxy approach, and either 99% or 99.99% for the various proxy setups.

25
We concentrate on a European-North-Atlantic domain ( Figure 1). There, we choose 17 locations with proxy-inferred temperature records from the collection of Marcott et al. (2013, see also Tables 1 and B1). Nine of these series use alkenone U K 37 but the set also includes temperatures derived from foraminifera Mg/Ca (2 records), pollen (2), chironomids (2), TEX86 (1), and foraminiferal assemblages (1) (compare Table 1 and Appendix Figure A2 (2001) and Seppä et al. (2005) for the specific Table 1. Information about the considered proxy records: IDs, geographical location, seasonal attribution according to Marcott et al. (2013), proxy type, seasonal attribution used here, and analogue search setups that use the record. All proxy data are from the supplement of Marcott et al. (2013). pollen records, Larocque and Hall (2004) for the specific chironomid records, and Sarnthein et al. (2003a) for the specific record using foraminiferal assemblages.  Marcott et al., 2013) instead of that of Kim et al. (2004a, see also Marcott et al., 2013), which are basically co-located. We use the data of Kim et al. (2004a) in one alternative proxy setup (see Table 1, Appendix A, and Appendix Figure A2).
We consider the seasonal attributions of individual proxy records in our search for analogues. We generally take the attributions, the calibrations, and the uncertainties for the records as published by Marcott et al. (2013) but also check the references 5 provided by them. Seasonal attributions are diverse for the various proxy records. The majority is either for summer season (7) or annual (8) according to Marcott et al. (2013). We compare the proxies to the simulation output season that is close to the seasonal attribution as given by Marcott et al. (2013) or the original publication. For simplicity's sake, we only consider the modern meteorological seasons DJF (December to February), MAM (March to May), JJA (June to August), and SON (September to November) as well as the calendar annual simulation means (compare Table 1). We do ignore possible calendar 10 effects (e.g., Bartlein and Shafer, 2018;Kageyama et al., 2018).
For the sake of simplicity, we decided to assume a uniform uncertainty of σ = 1K for all proxies. This reduces the uncertainty for some records and potentially increases the uncertainty for others. We regard this to be a reasonable simplification for records where we are unable to infer full uncertainties for the temperature reconstructions either from Marcott et al. (2013) or from the original publications. 15 We performed reconstruction exercises for various proxy setups. In the main manuscript we concentrate on the full set of proxies mentioned above (see Figure 1b and first 17 lines of Table 1). Figure 3b visualizes how many of these 17 proxies are available for the dates for which we aim to reconstruct temperature. The panel shows this for two different assumptions on uncertainty (see section 2.2.3).
Appendix A includes information on the additional attempts at reconstruction by analogue (Appendix Figures A1 to A3, 20 compare also Table 1). Appendix A1 provides the results. We shortly discuss the results for these alternative setups in our results section below. Most notably among these alternative tests are setups that use only U K 37 proxies (Appendix Figure  A1c,d, see also Appendix Figure A2). The difference between the two U K 37 setups is that E03 ( Figure A1) uses record GeoB 5901-2 instead of record M39-008 (compare Table 1 and Figure A2). Figure

Pseudoproxies
We test our analogue method using pseudoproxies. We calculate the pseudoproxies following the procedure of Bothe et al.
(2019, more specifically their ensemble approach) but omit their effective dating uncertainty error term. They provide pseudoproxies based on simulated annual mean temperature and for a global selection of grid points. Here, we calculate the pseudoproxies for summer season (June, July, August; JJA) and the chosen European-North-Atlantic domain only. The seasonal 30 choice is due to the large number of proxies attributed to summer in our real proxy selection. The approach provides also randomly chosen pseudo age uncertainties. The pseudoproxy computation uses the TraCE-21ka simulation (He, 2011;Liu et al., 2009

Simulations
Table 2 provides a general overview of the various simulations in our pool of candidates. Supplementary Tables B2 to B5 give additional information. We only consider previously published simulations. These stem from a variety of projects and were performed with a variety of models. The projects are TraCE-21ka "Simulation of Transient Climate Evolution over the last 21,000 years"   LGM, MidHolocene 400 LGM 600 LGM 500 LGM, MidHolocene, past1000 9309 We use simulations for various different time periods to increase the candidate pool. We assume that simulation climatologies can differ over a relatively wide range (e.g., Zanchettin et al., 2014). Simulations from the TraCE-21ka and the QUEST projects are transient over periods covering the last approximately 22kyr and the last glacial cycle respectively. Otherwise, the simulations are transient over the last millennium, or time-slices for the Mid-Holocene and the Last Glacial Maximum.
Additionally we also include pre-industrial control simulations. Such a multi-model and multi-time-period candidate pool ef-5 fectively follows suggestions of Steiger et al. (2014). We note that considering simulations for the last millennium as candidate for the Last Glacial Maximum can introduce climatological inconsistencies if the method identifies these fields as analogues.
The FAMOUS-HadCM3 simulations for QUEST use accelerated forcings (compare Smith and Gregory, 2012) and the output is only available in steps of ten simulation years. Therefore, we use this simulation only for a test case and exclude it for the main discussions. 10 We remap all simulation output to a 0.5 by 0.5 degree grid for the construction of pseudoproxies and for the search for analogues. The motivation is that thereby fewer proxies are close to the same grid point. However, resulting differences are  likely small between grid points. We use the original resolution for the final regional average reconstructions and the evaluation of field data. Local grid point evaluations are done against the remapped files.

Pseudoproxy application
The pseudoproxy application allows highlighting the possibilities of our implementation of the analogue method. It further already provides a glimpse at potential problems.
Our implementation of the analogue method searches for analogues within the full pool of simulation fields but exclud-5 ing the FAMOUS-HadCM3 output. Pseudoproxies are for the virtual climate in the TraCE-21ka simulation. We compare the pseudoproxy predictors to 101-year moving averages of the simulation output. We use 90% uncertainty ellipses in the pseudoproxy application of the analogue search. Valid analogues are those simulation fields that are within these resultant uncertainty envelopes for all pseudoproxy locations available for a date.
Temperatures are reconstructed for the full domain of the European-North-Atlantic sector including the Arctic as shown 10 in Figure 1. Figure 1a shows the proxy locations. The time period of interest ranges from the last glacial maximum until the late 20th century of the Common Era, i.e. it is the full period of the pseudoproxies from the Trace-21ka simulation. Figure   3a highlights that most pseudoproxies are defined at all dates. That is, the chosen sample dates of the pseudoproxies are very close to each other and, thereby, the generated dating uncertainties result in large overlaps. Figure 4 presents the pseudoproxies including their uncertainty envelopes. 15 In this setting, the analogue search tries to identify analogues for 4,947 dates. Our implementation finds between 6 and 14,343 analogues at 4,935 dates; it fails to find analogues for 12 dates during the glacial maximum. Analogues stem only from the Trace-21ka simulation from which we also constructed the pseudoproxies for this test. This is not surprising because the local climatologies of the TraCE-21ka simulation are slightly distinct from the other simulations (not shown). Using anomalies might compensate for this but please see our reasoning why we do not use anomalies in section 2.2.2. 20 Figures 5 and 6 provide information on which type of results we can obtain from analogue reconstructions. They plot results for the reconstruction from pseudoproxies. Panel (a) of Figure 5 is for the area mean reconstructions and it shows the resulting reconstruction median in black, the full range of potential analogues in light red, and a 90% interval in darker red. The blue line in the panel is the 101-year moving average regional temperature from the simulation, i.e. the reconstruction target.
It is encouraging that the target rarely falls outside of the envelope of potential analogues. However, we note that the range 25 of potential analogues is very wide. For example, the analogue search regards up to approximately two thirds of the simulated time period as valid analogues during the TraCE-21ka-equivalent of the Bølling-Allerød (compare Figure 5b). Additionally, the proxies, together with their uncertainties, are a weak constraint during most of the Holocene. The reconstruction envelope gives a constant upper limit for the most recent~9kyr and approximately the oldest 5kyr, and the lower limit is approximately constant between~15kyr and~18kyr BP. That is, the set of valid analogues has a notable overlap for these periods. The lacking Besides the regional average, the results also allow to extract the local representations. Figure 5c shows two examples. We will refer to those as the warmer and colder cases. The panel plots again the target simulation output in blue, the full analogue range in light red, and the analogue median in red. We also add the pseudoproxy in grey.
In the warmer case, the reconstruction median captures well the average characteristics of the pseudoproxy, which already follows the underlying simulation target rather closely. Again, the range of potential analogue cases and its median show little 5 variability. The narrow reconstructed range does not necessarily cover the moving averages of the target data. More importantly, it also fails to cover the full variability of the pseudoproxy. In the colder case, the analogue median fails to capture the average characteristics of the pseudoproxy except for approximately the most recent 5kyr. The pseudoproxy also deviates slightly more clearly from the target data than for the warmer example. Indeed, the analogue median differs less from the target data than from the pseudoproxy. The range of the analogues mostly includes the moving averages of the target data but not the pseudoproxy There is a very wide range of potential analogues. However, the examples highlight how much two climates may differ over the period although they result in a valid analogue for the proxies under uncertainty.
Finally, the approach allows considering the spatial fields for valid analogues. The pseudoproxy application of our implementation of an analogue search shows the viability of such approaches for reconstructing past climates from spatially sparse proxies with temporally sparse, irregular, and uncertain ages. The pseudoproxy 20 tests also highlights that the large uncertainties lead to a wide range of potential analogues. The proxies are only weak constraints on the potential climate.

Application to real proxies
Already the pseudoproxy test highlights the potential and the associated problems in using the analogue method for the type of proxies we are interested in, together with a limited pool of candidate fields. The analogue reconstruction captures the target 25 data but the search often provides a very wide uncertainty range due to the large number of valid analogues. Conversely, the method occasionally may fail to provide valid analogues.
Our focus here is on a multi-archive and multi-proxy reconstruction using 17 proxies (compare section 2.3.1) for the  Figure 7 shows the proxies and their uncertainties for the locations in Figure 1b. The panels highlight that the real proxy values are less equally distributed through time, are generally smoother, and differ more in their lengths compared to the pseudoproxy setup. Figure 3 already showed how the number of available proxies increases from 11 to 17 but then again decreases until only 5 proxies are available for the earliest dates. The appendix compares the full 17-proxy setup to different sets of proxies. Table 1 and Appendix Figures A1 to A3 give details for the different sets.
In the case of the main set of 17 proxies, our implementation tries to find analogues for 1,781 dates. There are 1 to 900 analogues for 141 dates for a 99% envelopes (see Figure 8b). Analogues come from two different simulations. It is obvious that the method often fails. For the 99.99% envelope, these basic results change. The method identifies 1 to 31,304 analogues 5 at 1,288 dates (see Figure 8b). These come from 42 different simulations. There are no valid analogues between~10kyr BP and~14kyr BP. Otherwise, there are extended periods with very many analogues and other periods with few analogues.
For the narrower uncertainty envelope, the method finds valid analogues only for the recent past millennia (Figure 8). Even then, it is only successful for few periods (Figure 8b). In this case, the range of the area average reconstruction ( Figure 8a) and at the local proxy location (Figure 8c) is very narrow. There is very little regional or local variability in the analogues. However, 10 the reconstruction may reflect well the average state of the local proxy series ( Figure 8c). As for the pseudoproxy test, we can expand the analogues, i.e., the 101-year moving means, to reveal the potentially underlying time-variations (Figure 8d and e).
These also reflect the narrow range of potential analogues for the regional average but also for the local series.
For the wider uncertainty envelope, the method identifies valid analogues for more dates (Figure 8) and, generally, there are more valid analogues for these dates (Figure 8b). However, there are more proxies available for some dates (compare Figure 3) 15 and this increases the number of constraints on the analogue candidates for these dates. Thus, there are dates when the range of the regional average reconstruction for a 99.99% uncertainty envelope does not necessarily include the 99% envelope data.
The range of the reconstruction may be wide regionally or locally for the 99.99% envelope, but this does not ensure that it locally includes the proxy values (Figure 8c). There is little temporal variability in the reconstructed data. This is mainly because of the large number of analogues and the relatively low temporal variation in the set of valid analogues (Figure 8b). 20 Further, the reconstruction is rather constant. These highlight that, for the late Holocene date, the found analogues capture the proxies rather well though with exceptions 25 over Scandinavia. However, the date at~14kyr BP strongly disagrees for the only valid proxy at high northern latitudes. The range is very narrow for the late Holocene example from the narrow uncertainty application. Differences become largest over Greenland and along the sea-ice edge. For the deglacial example from the wider uncertainty application, differences become largest east and west of Iceland. Table 1 introduces a number of additional proxy setups. The sets use different sub-selections of proxies from our initial 30 selection. Appendix A provides additional information and presents details of the results in its subsection A1.
The different sets confirm that loosening the uncertainty constraint leads to valid analogues for notably more dates. We also obtain analogues at more dates if we keep the uncertainty envelope at the lower level but do not preprocess the simulation output to 101-year moving means. This inclusion of interannual data increases the number throughout the reconstruction period. Generally, the reconstruction success appears to be better for proxy setups that only include U K 37 records. Such consistent sets of proxies provide a more continuous reconstruction for both local uncertainty levels.
Multi-archive setups with fewer proxies give generally wider ranges of possible analogues. Otherwise all setups tend to be in a comparable range regarding their median and their range considering the last 10 millennia. Differences between all setups are largest in the 14th millennium BP due to a larger range for some reconstructions.

5
Generally, we find that the reconstructions from different setups differ in their ability to reconstruct climate for certain periods. Indeed, different setups may provide notably different climates, particularly for the early part of the time period of interest. All setups provide rather constant reconstruction ranges.

Discussions
Our implementation of an analogue search method for reconstructing surface temperature over multimillennial timescales relies 10 on a number of decisions, which are uncommon compared to other paleo-reconstruction efforts on multimillennial timescales.
Central to our assumptions is that taking account of uncertainty is indispensable in analogue approaches for paleoclimatology and particularly if one uses spatially and temporally sparse as well as data and age uncertain proxies. There is one prime motivation behind our specific handling of uncertainty and our selection of reconstruction dates: The analogue search for a chosen date should use as much information about this date as possible, including the uncertainty of other data points whose age uncertainty include the currently given date of interest.
This leads to the use of uncertainty ellipses. Assumptions here are that, firstly, data and date are inseparable; secondly, that this assumption also holds for the tuple and its two-dimensional uncertainty; and, thirdly, that a reconstruction exercise has to 5 consider both parts of the uncertainty to sufficiently estimate the range of reconstructed values. Our procedure is a simplified approach to incorporating these assumptions. More correctly, one would calculate the multivariate mixture distribution and identify the relevant uncertainty range from it. As a sidenote, the full multi-predictor space for all dates also is a multivariate mixture distribution, which one could employ in more sophisticated data science approaches.
Our handling of uncertainty causes that we cannot easily implement a distance measure like the Euclidean. A more formal 10 definition of similarity should take into account the multivariate and correlated nature of uncertainty: in time and across-proxies.
We are confident that considering both parts of the uncertainty enables better and more reliable estimates. We concede that this procedure may exaggerate the range of potential climates and thereby may reduce the precision of the reconstruction. We postulate that this, however, is only partly due to the assumptions on uncertainty, which may transfer uncertainty to too many records. We think it is also because the simulation pool does not reflect the climate relations among proxy record locations 15 as given by the proxy records. It is beyond the scope of the present study to investigate whether this, in turn, is because of unreliable simulations, lacking overlap between reconstructed and simulated climates, or lacking reliability of the proxy records, that is their errors.
With respect to the lacking precision of the reconstructions, Annan and Hargreaves (2012) already identified a similar issue in their particle filter data assimilation approach. Annan and Hargreaves note that in a setup where one has only few and highly 20 uncertain proxy predictors the reconstruction tends to lack accuracy. We think that for the analogue method one could remedy this by weighing the valid analogues by a distance measure relative to the pattern of proxy predictors or by their agreement with each individual predictor.
Related to our handling of uncertainty is our approach of reconstructing data for those years when at least one proxy predictor is dated. This also may contribute to the wide range of the reconstructions by neglecting information in between these 25 dates. Alternatively, one could pool the proxy dates into constant intervals of, for example, 100 years. Such an approach makes as strong assumptions as our procedure. We note that Jensen et al. (2018) use the published age models to interpolate their proxy records to consistent time steps. They compare their proxies to 10-year averages of the simulation pool. Incorporating, presumably Bayesian, age models maximises the available prior information used from, e.g., the original measurements.
Nevertheless, we decide against interpolation procedures, even based on Bayesian age models, assuming that this may result 30 in overconfidence in the subsequent reconstruction. For example, interpolation could suggest more certainty in reconstructed values where and when we have little to no proxy information (see, e.g. Figure 7i between approximately 9kyr and 11kyr BP).
Additional assumptions relate to characteristics of the considered proxy predictors. This includes our decision to generally compare the proxy predictors to centennial averages of the simulation output. Thereby, we do not allow for the fact that the proxy sensor might record extreme-like events. Further, we compare the proxy predictors and the simulation pool in terms of temperatures instead of using surrogate proxies in proxy units from the simulation pool. Finally, the use of temperature for the surface and an attributed and calibrated season does not account for the sensor specific habitats and seasonal sensitivities or their changes (compare Jonkers and Kučera, 2017;Kretschmer et al., 2018). Our comparison, thus, is based on the assumption that the proxy inferred climate property and the proxy record reasonably well relate to the parameter of interest (annual surface temperature) and that, in turn, comparisons to the equivalent simulated output are valid. Therein we rely on the previously 5 published information about the considered proxy record. Similarly, our expansion of the temporal average reconstructions into 101-year time-series relies on the quality of the proxy data and on appropriate assumptions on the temporal representativeness of the data. The possibility for such a temporal downscaling is a unique feature of analogue search reconstructions from temporal averages and of comparable data assimilation techniques.
Possible improvements of the method would respect more explicitly the irregular resolution of the proxy records and the 10 different resolutions between the records. Similarly, applications benefit if we can discriminate whether a proxy sensor records mean climatic conditions or extreme-like events. Including the proxy specific habitat and growth season also leads to a more appropriate comparison as does employing proxy forward models to make the comparison in proxy units.
Published proxy records do not necessarily provide all the information to assess, e.g., the resolution. The available published information is also generally not sufficient to identify whether a record represents events or mean states. Regarding uncertainty, 15 producers of proxy records do not always report calibration uncertainties. Even if these are known, assumed uncertainties may not capture all potential error sources.
Better understanding of the proxy systems and availability of the full simulation output data would allow for proxy series specific analogue searches. It further would enable the use of locally calibrated process-based forward integrations by proxy system models. The advent of proxy system forward models in principle allows producing proxy parameter representations in 20 the virtual environment of the simulations (Schmidt, 1999;Tolwinski-Ward et al., 2011;Thompson et al., 2011;Evans et al., 2013;Dee et al., 2015Dee et al., , 2016Dee et al., , 2017Dee et al., , 2018Jones and Dee, 2018;Dolman and Laepple, 2018) but there are still gaps in the understanding on how the sensor recording of the biological, physical, chemical, or geological process reacts to the environment.
Additionally, records may lack necessary information. While such applications are quickly developing (see Dee et al., 2016;Jones and Dee, 2018;Dolman and Laepple, 2018;Konecky et al., 2019), data assimilation of this kind of information is still 25 not operational even for the Common Era with its potentially high resolution and potentially high quality proxies (Hakim et al., 2016;Tardif et al., 2019;Emile-Geay et al., 2017).
It is generally advisable to use consistent proxy parameters, a consistent recalibration, and a consistent calibration target.
This should increase the probability of the proxy predictors constraining the pool of potential analogues (compare the results in Appendix A1). Often such consistency is an implicit or explicit assumption (compare, e.g., Reschke et al., 2018). On the 30 other hand, the analogue approach, in theory, should allow using different parameters and calibrations if the comparison is to the same target. Indeed, ideally, it should also compensate even a comparison of different parameters. This, however, depends on how much proxy records indeed constrain the ultimate target property for the reconstruction.
Our reconstruction is only for the approximate domain of the proxy predictors. However, it may be possible that a set of proxy predictors from, for example, Europe also provides information on larger scale climate variables. Indeed, if there is 35 23 https://doi.org/10.5194/cp-2019-170 Preprint. Discussion started: 16 January 2020 c Author(s) 2020. CC BY 4.0 License. evidence that the proxy predictors are relevant constraints on other climate fields beyond, in this example, temperature, the pool of analogues can provide information on other climate variables.
Regarding the resolution, a test of our method suggests that, for a given uncertainty interval, the analogue search is more successful in finding valid analogues if we consider interannual data. While such an interannual analogue search may misinterpret what the proxy data represents, it may be a more truthful comparison considering the potential level of environmental 5 noise in the proxy data relative to the targeted temperature signal.
Beyond these methodological aspects the size and character of the pool of analogue candidates influences the quality of the results. As apparent from this paper, a pool including mid-Holocene, Last Glacial Maximum, and transient deglacial simulations does not ensure finding valid analogues. An insufficient large pool of candidate analogues requires wider assumptions on uncertainty to obtain valid analogues. Thereby, the analogues remain unconstrained. A small pool also allows for non-10 uniqueness of analogues. Additionally climatological inconsistencies become more likely if the range of simulated periods in the model pool is wide.
We do not use anomalies. If there was a large ensemble of simulations over this period, the use of anomalies would be advisable. Similarly, if all proxy records had common modern age data, there might be a valid anomaly building process.
However, we include simulations for time-slices with notable different climatologies, and proxy records begin at various 15 modern dates. One solution could be a sliding climatology for the proxies, which is added again for the final reconstruction.
We note that using anomalies also might result in climatic inconsistencies. Furthermore, if we want to apply proxy forward models based on the calibration between measured property and temperature we do not use anomalies either because calibration relations frequently need temperature on either the Celsius or Kelvin scales.
This section outlined a number of potential improvements of the approach. Some of these would increase the number of 20 necessary computations. While the increase in costs is not prohibitive, we decided against including such procedures here.
However, it appears particularly worthwhile to try to implement a workflow that combines feasible data science methods, some version of simple data assimilation, and a proxy system model framework like PRYSM (Dee et al., 2015(Dee et al., , 2016Jones and Dee, 2018) in future attempts of spatiotemporally resolved reconstructions if the interest is in a dynamical understanding of the climate variability over multimillennial timescales.

Summary and concluding remarks
The analogue method is a computationally cheap data assimilation approach. Here, we discuss a specific application for time uncertain, sparse, and irregularly sampled proxies. We focus on the North Atlantic sector and the time period from about 15kyr BP to the late 20th century.
The approach succeeds in providing a reconstruction in a pseudoproxy setup. It is less successful for realistic proxy setups. 30 Then, reconstructions fail particularly over the late deglaciation and early Holocene.
Reconstructions are generally rather imprecise. The range of potential analogue values is very wide for a given date. Regional average reconstruction medians show little variation over time.
The analogue method is non-linear and considers the spatial covariances between the proxy records. Thereby, resulting fields provide us with spatial field estimates of past climate states that are consistent with the regional inter-relations as presented by the proxy predictors.
Data availability. We provide lists of valid analogues per date and experiment at https://osf.io/pj9eg. This allows identifying valid climate states for dates. We also provide files for area mean analogue ranges and medians.
(i) E08 (j) E09 Figure A1. Map of the reconstruction domain and the proxy predictors: (a) for the pseudoproxy setup; (b) to (j) for the different proxy setups.
For more details see Table 1 and Figure A2.
Appendix A: Information on the additional proxy setups Figure A1 shows the proxy distribution for the various proxy setups within the reconstruction domain. Figure A2 gives more information about which records are included in the different setups and their proxy types. Panel (a) of Figure A1 repeats the pseudoproxy setup from Figure 1, which has a higher density than any of the real proxy setups in panels (b) to (j  Figure A2. Information about the different proxy setups: Matrix of proxy records against proxy setup (E01-E09). For more information see Table 1. White-out means that the relevant proxy is not included in the respective proxy setup. Figure A3 complements Figure 3 for the various setups. Panels show the number of available proxies and the number of dates for which analogues are searched. Figure A4 adds the proxy data and assumed uncertainties for the record GeoB 5901-2 and thereby supplements Figure 7.
For more information see Table 1.
A1 Results for the different proxy sets 5 Table 1 introduces already a number of additional proxy setups. The sets use different sub-selections of proxies from our initial selection. Figures A1 to A3 provide additional information on the various setups. Figure A5 shows the reconstruction results for the proxy setups E01 to E09. All panels plot the reconstructions using the 99% and the 99.99% uncertainty envelopes. Panel (a) adds for our main setup also a reconstruction where we consider interannual data for the simulations and include the FAMOUS-HadCM3 simulations. Please see section 2.3.3 for details on 10 these simulations.  Figure A3. Information about the number of available proxies for the dates to be reconstructed: (a) the pseudoproxy setup, (b) to (j) the various proxy setups according to Figure A2 and  Figure A4. Proxy data and assumed uncertainties for GeoB 5901-2.
simulations and comparison to interannual fields (see black colored results in Figure A5a). Including the QUEST-FAMOUS simulations increases the number of analogues in the 15th millennium BP, whereas including interannual data increases the number throughout (not shown).
It appears that the internal coherence among the considered proxies matters as, generally, the reconstruction success appears to be better for proxy setups that only include U K 37 records, i.e. proxy setups E02 and E03 (see Figure A5). We show these in 5 panels (b) and (c). The setup in panel (c) differs from the one in panel (b) only in so far as we replace the record M39-008 with GeoB 5901-2. Both proxies are co-located.
These consistent sets of proxies provide a more continuous reconstruction for both local uncertainty levels. Nevertheless, we fail to obtain reconstructions at the end of the deglaciation. While results are quite similar over much of the period between both, the second setup allows a wider and potentially colder range for the period before~12kyr BP. 10 Further panels of Figure A5 add different setups. Panel (d) complements the U K 37 proxies by one foraminiferal assemblage record. Panels (e) and (f) also test different setups dominated by U K 37 but including other proxies. Panels (g) to (i) use different small setups of proxies around the European area.
Both setups in panels (e) and (f) fail to provide analogues before the deglaciation for the narrower uncertainty envelope. The setups in panels (g) and (i) are notably warmer in the 14th millennium BP compared to results in panel (h) but also compared 15 to other setups. This holds for both uncertainty envelopes. A common difference is the inclusion of M39-008 while excluding the U K 37 record D13882 (compare Table 1 and Figure A2). The latter record is thought to represent summer temperatures off the west coast of Portugal while the former is meant to represent annual temperatures in the Gulf of Cadiz. We note that panels (e) and (f) also are warmer in the 14th millennium BP for the wider uncertainty envelope. These also include M39-008 and exclude D13882. Please note, the supplement of Marcott et al. (2013) refers to D13882 as D13822. 20 The multi-archive setups with fewer proxies give generally wider ranges of possible analogues. Otherwise, all setups tend to be in a comparable range regarding their median and their range considering the last 10 millennia. Differences between all setups are largest in the 14th millennium BP due to a larger range for some reconstructions. However, the few reconstructions with valid analogues in the 12th and 13th millennium BP give similarly wide ranges then.
Generally, reconstruction ranges are rather constant. We, further, find that the reconstructions from different setups differ in their ability to reconstruct climate for certain periods. Indeed, different setups may provide notably different climates particularly for the early part of the time period of interest. Certain proxies appear to shift the results for the earlier part of our 5 reconstruction between a warmer and a colder deglacial estimate. It is beyond the scope of this paper to disentangle the reasons for this.   Table B2 provides references for the various models from which we include simulations in the candidate pool. The table further gives links to the repositories where interested researchers can obtain the simulation data.
Tables B3 to B5 complement tables 2 and B2. They give the central simulation IDs. This allows finding the simulations more easily in the repositories.