Inferences about climate states and climate variability of the Holocene and the deglaciation rely on sparse paleo-observational proxy data. Combining these proxies with output from climate simulations is a means for increasing the understanding of the climate throughout the last tens of thousands of years. The analogue method is one approach to do this. The method takes a number of sparse proxy records and then searches within a pool of more complete information (e.g., model simulations) for analogues according to a similarity criterion. The analogue method is non-linear and allows considering the spatial covariance among proxy records.

Beyond the last two millennia, we have to rely on proxies that are not only sparse in space but also irregular in time and with considerably uncertain dating. This poses additional challenges for the analogue method, which have seldom been addressed previously. The method has to address the uncertainty of the proxy-inferred variables as well as the uncertain dating. It has to cope with the irregular and non-synchronous sampling of different proxies.

Here, we describe an implementation of the analogue method including a specific way of addressing these obstacles. We include the uncertainty in our proxy estimates by using “ellipses of tolerance” for tuples of individual proxy values and dates. These ellipses are central to our approach. They describe a region in the plane spanned by proxy dimension and time dimension for which a model analogue is considered to be acceptable. They allow us to consider the dating as well as the data uncertainty. They therefore form the basic criterion for selecting valid analogues.

We discuss the benefits and limitations of this approach. The results highlight the potential of the analogue method to reconstruct the climate from the deglaciation up to the late Holocene. However, in the present case, the reconstructions show little variability of their central estimates but large uncertainty ranges. The reconstruction by analogue provides not only a regional average record but also allows assessing the spatial climate field compliant with the used proxy predictors. These fields reveal that uncertainties are also locally large. Our results emphasize the ambiguity of reconstructions from spatially sparse and temporally uncertain, irregularly sampled proxies.

It is a pervasive idea in environmental and climate sciences that past
states provide us with information about the future

If we want to use the analogue method beyond approximately the last two
millennia, we have to tackle additional challenges, which usually can be
evaded for the Common Era. For example, our proxy records are not only
spatially sparse but they also have a coarse temporal resolution on
these timescales. Furthermore, the sampling generally is irregular for
each individual proxy. Indeed, sample dates differ between proxies on
these timescales, and these dates are also uncertain. Recently,

The basic idea of the analogue method is simple. An analogy tries to explain an item based on the item's resemblance or equivalence to something else. In the analogue method, one uses a set of sparse proxies, i.e., predictors, and searches for analogues for them in a pool of candidates that are spatially more complete. In paleoclimatology, the predictors can be local proxy records and the candidate analogues can be fields from climate model simulations. One assesses the similarity of the simulation output and the proxy records at the proxy locations to find valid analogues. The reconstructed field is then the complete field given by the analogue.

It is important to note that comparable approaches suffer from a
trade-off between accuracy and reliability of reconstructions, as shown
by

Most paleoclimate applications of the analogue method focused on the
Common Era of the last 2000 years

Here, we describe another approach to obtain reconstructions by analogue
over millennial timescales based on spatially and temporally sparse and
uncertain proxies. It differs in some aspects from the approach so far
applied to shorter and more recent periods. Our approach tries to
explicitly consider not only age uncertainties

Beyond the mentioned challenges for analogue reconstructions on
millennial timescales, the method is also constrained by the pool of
available analogue fields.

The next section first summarizes again the main characteristics of
analogue searches for paleo-reconstructions. Afterwards, we present our
way of dealing with uncertain tuples of data and date, that is,
describing ranges of tolerance for which we choose analogues. Simulation
fields are considered analogues if they fall within these tolerance
ranges at all considered proxy locations. We also describe how we
consider the fact that different proxies are sampled at different times.
The section also presents our selection of a simulation pool. We present
results for a multimillennial period for a pseudo-proxy setup

In an analogue search, one tries to complement incomplete information from one dataset by data from other more complete datasets. One ranks the more complete data by their similarity to the available information in the first dataset. In paleoclimatology, this usually means that one uses a set of spatially sparse proxy records and wants to find fields from simulations or reanalyses that are most analogous to the proxy records at their locations. The pool of candidate fields depends on the available simulations and reanalyses.

If, for example, one uses proxies for temperature, such a ranking may
simply provide the simulated temperature field that has the smallest
Euclidean distance to the sparse proxy information at their locations.
Alternatively, one can consider not just one but a small number of good
analogues with small distances

An important aspect of a paleoclimate reconstructions is the uncertainty
of the reconstructed data. To our knowledge, only

We use spatially and temporally sparse proxies, affected by uncertainties in their values and their dating for analogue searches on millennial timescales. Next, we detail our simplifying assumptions about what the data represent, their uncertainties, and the dating uncertainties. We also describe how we choose the dates for which we perform the climate reconstruction.

Our interest is in temperature. Specifically, we concentrate on means of
seasonal or annual temperature at the surface. We consider proxies for
which the literature previously reports a sensitivity to temperature in
the form of a calibration relation. We search for analogues within fields of
simulated surface temperature. To do the comparison, we consider the
model variable “surface temperature” over the European–North Atlantic
domain shown in Fig.

Map of the reconstruction domain and the proxy predictors: for the pseudo-proxy setup (blue), experiment P01, and for the main proxy setup (red), experiment E01. Please note the small offset between the proxy locations and their pseudo-proxy counterparts on the discrete model grid.

Theoretically, the variable or variables to be reconstructed can be different from the variable or multiple variables represented by the paleo-observational predictors. Indeed, we here assume that it is possible to reconstruct annual temperatures from proxy records with diverse seasonal attributions.

Using temperature in a multi-proxy comparison requires a number of
assumptions. First, we assume that the proxy recorders indeed were
temperature sensitive. More importantly, here, we assume that all the
different recorders, aquatic or otherwise, represent temperature at the
surface. This is an assumption of convenience in view of potential
habitat biases of the proxy records

Section

Optimally, one would aim for maximal consistency in the comparison. Consistency among parameters and calibration ensures a relation among the proxy predictors, which, one can assume, increases the chance that the proxy records lead to a selection of physically meaningful analogues. In this case, the proxies can effectively anchor the analogue selection. We here assume that all chosen proxy types reliably represent the target of interest and a multi-proxy approach is viable.

The analogue method allows searching for analogues at dates when there
is information. One can pool the predictor dates into consistent
intervals of, for example, 500 years, and search for analogues for these
500-year pools. One can follow the example of

Information about the considered
proxy records: IDs, geographical location, seasonal attribution according to

Each data point of a proxy series potentially represents a time interval
of a specific length, and the comparison should consider this temporal
resolution. That is, if one data point represents a 50-year accumulation
and another data point represents a 500-year accumulation, the procedure
ideally accounts for these differences. We decide to use typical
resolutions instead of individual resolution estimates to simplify the
procedure and allow a computationally more efficient analogue search for
data- and time-uncertain proxies. Indeed, it is not necessarily the case
that a proxy-record publication includes the information to estimate the
pointwise temporal resolution. Considering information provided by

Therefore, we decide to compare the proxy estimates to 101-year averages of the model simulation output. That is, we compare them to 101-year mean values, which we obtain by using a 101-year moving mean on the simulation output time series that is closest to the proxy location.

In one test case, we do not preprocess the simulation output but use the
annually resolved values of the output for the comparison. For this
specific test, we also include the simulation data from the
FAMOUS-HadCM3 simulations for the Quantifying and Understanding the Earth System (QUEST) project

We test the whole approach by using pseudo-proxies. We construct the
pseudo-proxies following the ensemble approach of

Simulations potentially differ in their modern-day climate mean

However, in the present case, the period of interest includes mainly the last 15 kyr. Thus, it spans part of the deglaciation from the Last Glacial Maximum to the Holocene optimum. Our selection of simulations can only piecewise cover that period of interest, which complicates the construction of a surface temperature candidate pool. Indeed, the most recent dates differ among the proxy records, and thus there is no simple procedure to provide anomalies relative to a consistent modern climate. Additionally, using anomalies may introduce climatic inconsistencies if we are interested in climate variables other than temperature. For these reasons, we decide that we cannot reasonably use anomalies. Instead, we try to find analogues for the local proxy reconstructions in their absolute temperature units without subtracting any climatology.

We are interested in millennial timescales from the last deglaciation
until the recent past. On these timescales, uncertainty affects our
proxy predictors in two ways. First, we have to consider the age or
dating uncertainty. Second, the measured proxy data and the temperatures
inferred from them are affected by various sources of uncertainty

Previous applications of the analogue method usually did not consider
proxies with considerable age uncertainties except for

Uncertainty of proxies in time and date is commonly expressed as central
value and a given uncertainty range. These ranges may be given as plus
and minus standard deviations around the central value, e.g.,

That is, we choose a different approach (Fig.

Considerations on uncertainty and constructing tolerance envelopes:

To define these areas of tolerance, we still have to define their shape. Our interest is in finding analogues that agree with the proxy data but also account for these uncertainties. Then, we could take the uncertainty estimates of temperature and time to construct a two-dimensional uniform estimate in the form of a rectangle of tolerance. Analogue candidates would be valid analogues if they fall locally within these rectangles. If they fall outside of the rectangle, they would not be considered valid analogues. Although the uncertainties in temperature and time are commonly taken to be Gaussian, the rectangular approach is the best one if we consider the uncertainties of date and temperature isolated from each other. Then, our tolerance for the temperature data has the same structure at the border of our temporal tolerance range as it has at the central estimate for the date. However, in our application, we do not see both tolerance ranges in isolation. We assume that our tolerance range is a two-dimensional pairwise construct in time and temperature. Then, our tolerance construct takes the shape of a two-dimensional Gaussian. This implies that our tolerance areas are ellipses. Such ellipses can be computed dependent on an assumed pairwise confidence level or coverage or in our interpretation tolerance range. We refer to these as percentage levels.

According to our view of tolerance ranges as tolerance ellipses, we accept fewer analogues for dates far away from the median proxy age estimate. For these dates, analogue candidates need to be numerically very close to the proxy. In contrast, we accept more analogues close to the central age estimate of the proxy and tolerate that they may more strongly differ from the numerical central estimate of the proxy. We acknowledge that it may seem counterintuitive that we reduce the range in data uncertainty at dates far away from our central best estimate of temperature and date. This originates from our assumption that the pair of data and date stems from a two-dimensional distribution that is centered on our best estimate. Thereby, the likelihood of a valid pair of data and date reduces further away from our best estimate according to the assumptions on the distribution.

As we have estimates of the uncertainties of the data point, we can
construct and visualize the ellipses of tolerance around each data point
under the assumption of two-dimensional Gaussian tolerance areas. We use
the R

A two-dimensional tolerance ellipse represents tolerance levels for
two-dimensional normal distributed data. However, as in the simple case
of a tolerance rectangle, our interest is only in the ellipse as a
binary decision criterion to consider the data included in the ellipse
and to neglect the data outside of the ellipse. That is, we use the
ellipse as an area of tolerance to identify valid analogues from our
analogue candidate simulation field pool. The ellipses provide the
maximal acceptable distance for simulated data to be considered as an
analogue (Fig.

The ellipses are defined from points in the proxy–time space (see Fig.

That is, the superposition of ellipses constructs a tolerance envelope
(Fig.

Because we provide reconstructions only for those years for which one of the chosen proxy records includes a dated value, and because our tolerance estimates are essentially pointwise, the envelope may not be one continuous envelope over the full period of interest. Furthermore, because we use the envelopes as a decision criterion, it can happen that the method fails to find any valid analogues for given years.

Our pointwise estimates are compliant with the initial uncertainty of
the proxies, and our final reconstruction uncertainties are an expression
of this initial confidence in the local data. This is in contrast to

The ellipses of tolerance allow in theory to produce reconstructions for
each year included in the dating uncertainty. That is, if a proxy series
has a value dated to the year 500 BP with a dating uncertainty of

In other applications of the analogue method, the choice of a valid
analogue usually relies on a distance metric. This is commonly the
Euclidean distance

Here, we deviate from this and decide neither on a fixed number of
analogues nor a defined metric. Candidates in our pool are valid
analogues if they are within the tolerance range (compare Sect.

We additionally show one instance of a reconstruction using just one
best analogue. For this test, we choose the analogue with the smallest
Euclidean distance to our proxy values. As we deal with proxy records
that are irregularly spaced in time, we have to find a way to select
dates for which to do a single best analogue reconstruction and get the
proxy values for these dates. To do so, we consider the proxy values
valid at all dates within a given range around their dating. We identify
the range of these values and take the midpoint of that range as the
proxy value for this date. We consider values within a 90 % or

In short, our reconstruction is based on the following workflow. We have a set of sparse proxy predictors and a pool of simulated fields. As our proxies are not only sparse in space and uncertain in their values but also irregular and uncertain in time, we have to decide (a) when to compare them, (b) in which resolution to compare them, and (c) how to consider the uncertainties in time and value. Therefore, we decide to (i) compare the proxies and simulated data for all dates when one proxy is dated, (ii) compare the proxies to 101 moving means of the simulated data, and (iii) take the proxy data values as valid within an ellipse of tolerance around the dated value in time and temperature space. Then analogue candidate simulation fields are valid analogues if they are within these tolerance ranges around all proxy records included in the search.

We concentrate on a European–North Atlantic domain (Fig.

Information about the different proxy setups:
matrix of proxy records against proxy setup (P01, indigo, and E01 to
E09, burgundy red). For more information, see Table

We do not include all records from

We consider the seasonal attributions of individual proxy records in our
search for analogues. We generally take the attributions and the
calibrations for the records as published by

Regarding proxy uncertainty, we decided to assume an uncertainty of

Information about the number of available
proxies for the dates to be reconstructed:

We performed reconstruction exercises for various proxy setups. We
concentrate on the full set of proxies mentioned above (see Fig.

Figure

We use pseudo-proxies calculated following

Here, the pseudo-proxy computation uses QUEST FAMOUS simulation data

We modify the pseudo-proxy script of

Pseudo-proxy data and assumed uncertainties for the 17 locations in our pseudo-proxy application.

The 17 pseudo-proxy locations are close to the realistic proxy locations
(compare Fig.

Information about the pool of
simulation data: model name, the project for which the simulations were
performed, the simulated periods from this model output, the number of total
years. All simulation data are remapped to 0.5 by 0.5

Table

We use simulations for various different time periods to increase the
candidate pool. We assume that simulation climatologies can differ over
a relatively wide range

We remap all simulation output to a 0.5 by 0.5

The pseudo-proxy application allows highlighting the possibilities of our implementation of the analogue method. It further already provides a glimpse at potential problems.

We recapture our approach briefly. Our analogue method searches for analogues within the full pool of simulation fields but excludes the FAMOUS-HadCM3 output from the QUEST project. Pseudo-proxies are derived from this latter simulation. We compare the pseudo-proxy predictors to 101-year moving averages of the simulation output. We concentrate on 90 % tolerance ellipses in the pseudo-proxy application of the analogue search but also include results for 99.9 % tolerance ellipses. Valid analogues are those simulation fields that are within the resultant tolerance envelopes for all pseudo-proxy locations available for a date.

Temperatures are reconstructed for the full domain of the
European–North Atlantic sector including the Arctic (Fig.

Reconstruction results for the pseudo-proxy
application of the analogue method:

In this setting, the analogue search tries to identify analogues for
1830 dates. Our implementation finds between 1 and 7919 analogues on
531 dates (Fig.

Results change if we consider a wider tolerance envelope. For an 99.9 %
tolerance envelope instead of a 90 % one, we are able to find between 1
and 16 944 valid analogues at 1438 of 1830 dates (Fig.

Figures

Temperature field reconstructions in

The results are encouraging but problems are obvious. We are able to find valid analogues for both tolerance ranges.

Analogues are regularly relatively close to the target for the narrow tolerance range. However, their number is often small and there are periods without any valid analogues. The range does seldom include the target. Further, the reconstruction with a narrow tolerance assumption does not provide valid analogues earlier than approximately 13 500 BP.

On the other hand, the range of potential analogues is only weakly
constrained for the wider tolerance range. For example, the analogue
search may regard more than 17 000 records of the TraCE-21ka simulation
as valid analogues around the year 10 000 BP (compare Fig.

The pseudo-proxies, together with their uncertainties, are a weak constraint during most of the period of interest if we assume a wider tolerance but they fail to capture the target if we assume a stronger knowledge about their value. In addition, the reconstruction envelopes and medians show rather little variability and often give nearly constant values over long periods. That is, the set of valid analogues has a notable overlap for these periods. The lacking variability among analogues together with the potentially wide range of analogues is reflected in the small variability in the reconstruction median.

Besides the regional average, the results allow us to extract the local
representations. Figure

At both locations, the range is very small for the narrow tolerance range. At the southern location, the reconstruction median is generally below the target, and the range is hardly identifiable and does not include the target. This is comparable to the northern location, where, however, the median is generally above the target. Even for the wide tolerance range, the target is more often outside than within the full analogue range at the southern location, while at the northern location the range includes the target regularly (not shown). Thus, the range of potential analogue cases is still relatively narrow at the southern location but can be already quite wide at the northern location. Also locally, analogue range and median show little variability. In the northern case, the analogue medians fail for both tolerance assumptions to capture the average characteristics of the pseudo-proxy except for approximately the most recent 3 kyr.

The pseudo-reconstruction results suggest that the approach can provide local information in addition to the regional average. Relatively wide tolerance appears to be necessary to capture the local characteristics at the two chosen locations. This is more successful for some periods but success always varies regionally.

Since we search analogues among temporal moving window averages, the
analogue search provides one more result of interest. Any analogue state
represents a temporal average. Since we also know the period that has
been averaged, we can provide the climatic time-varying sequence. This
informs us about the time variations underlying the analogue average
climate state. That is, we obtain climate evolutions that comply with
our proxy constraints. This, for example, allows us to get an impression of
how temperature changed on subcentennial, e.g., interannual, timescales
or to obtain an estimate of the interannual variability. Figure

Although we consider a narrow tolerance range, which results in very narrow ranges around the mean analogue state, the expanded range of potential analogues is still notably wide. The two examples of valid analogues highlight how much two climates may differ over the period, although both are valid analogues considering the proxy uncertainty. Wider tolerance ranges give larger ranges of reconstructions and result in larger differences between the 101-year time series.

Finally, our reconstruction approach allows considering the spatial
fields of valid analogues. Figure

The pseudo-proxy application of our implementation of an analogue search shows the viability of such approaches for reconstructing past climates from spatially sparse proxies with temporally sparse, irregular, and uncertain ages. The pseudo-proxy tests also show that the results depend on our assumptions on how tolerant we are with respect to our confidence in the proxy input. Overall, the pseudo-proxies are only weak constraints on the potential climate.

Already the pseudo-proxy test highlights the potential but also the associated problems in using the analogue method for the type of proxies we are interested in, together with a limited pool of candidate fields. The analogue reconstruction is able to capture the target data but the search may provide either a very wide or a too-narrow uncertainty range relative to the target. Wide ranges occur mostly due to the large number of valid analogues, while narrow ranges signal that there are only few analogues fitting the proxy data under the made assumptions on the fidelity of the proxies. The method may overall fail to provide valid analogues.

Our focus here is on a multi-archive and multi-proxy reconstruction
using 17 proxies (compare Sect.

Proxy data and assumed uncertainties for all proxy-record locations in our analogue search under two different tolerance envelopes.

Figure

Reconstruction results for the analogue method
under two different tolerance assumptions: panel

In the case of the main set of 17 proxies, our implementation tries to
find analogues for 1781 dates. There are between 1 and 900 analogues
for 141 dates for 99 % tolerance envelopes (see Fig.

For the 99.99 % envelope, these basic results change. The method
identifies 1 to 31 304 analogues at 1288 dates (see Fig.

For the narrower tolerance assumption, the method finds valid analogues
only for the recent past millennia (Fig.

For the wider tolerance envelope, the method identifies valid analogues
for more dates (Fig.

The range of the reconstruction may be regionally or locally wide for
the 99.99 % envelope, but this does not ensure that it locally includes
the proxy values (Fig.

Field information for the analogue search: two
examples of 101-year mean annual temperature analogue reconstructions
for the European–North Atlantic sector in

Figure

Table

Experiment E01 is our main setup. It was described in the previous
section. It uses the 17 chosen proxy locations, which we also use for
the pseudo-proxy setup. Setups E02 and E03 are based only on alkenone

Visualizing the reconstructions for the various
proxy setups:

Figure

The panels of Fig.

Generally, the method appears to provide more complete reconstructions
among our proxy setups for those that only include

Further panels of Fig.

Multi-archive setups with fewer proxies give generally wider ranges of possible analogues. Otherwise, all setups tend to be in a comparable range regarding their median and their range considering the last 10 millennia. Differences between all setups are largest in the 14th millennium BP due to a larger range for some reconstructions.

Both multi-proxy setups in panels (e) and (f) fail to provide analogues
before the deglaciation for the narrower tolerance assumption. The
setups in panels (g) and (i) are notably warmer in the 14th millennium
BP compared to results in panel (h) but also compared to other setups.
This holds for both tolerance envelopes. A common difference is the
inclusion of M39-008 while excluding the

Generally, we find that the reconstructions from different setups differ in their ability to reconstruct climate for specific periods. Indeed, different setups may provide notably different climates, particularly for the early part of the time period of interest. Particular proxies appear to shift the results for the earlier part of our reconstruction between a warmer and a colder deglacial estimate. It is beyond the scope of this paper to disentangle the reasons for this. All setups provide rather constant reconstruction ranges.

As noted, Fig.

In the period between 4000 and 8000 years BP, when other approaches give very narrow ranges due to few valid analogues, there are cases when the result from the single best analogue setup differs notably from the other efforts. However, it is still within the range of results from the other experiments for earlier and later periods. Such deviations from the tolerance area approach are reasonable since our construction of the proxy values for the single best analogue search can provide a notably different proxy state compared to the tolerance envelopes constructed for our standard approach. Another potential explanation is that the analogue that minimizes the overall distance may be outside of one or even multiple tolerance ranges. Finally, we already mentioned that changing a tolerance level may change the number of proxy locations included in a search. For example, widening a tolerance level may result in inclusion of more proxy locations for specific dates. The construction of the proxy values for the single best search similarly changes the underlying multi-dimensional proxy vector. Indeed, an inspection of the data indicates that, in our test case, the found analogue does fall outside the tolerance ranges at least at one location.

We also note that the single best analogue approach allows us to obtain estimates when the other approaches fail between 10 000 to 14 000 years BP. Comparably to our other reconstruction attempts, the single best analogue reconstruction shows only little variability. Noteworthy are the reconstructed values in the 15th millennium BP where the single best analogue represents a Holocene-level warm climate and not a deglacial climate.

Visualizing alternative reconstructions:

We consider two more modifications of our approach. Figure

Our implementation of an analogue search method for reconstructing surface temperature over multimillennial timescales relies on a number of decisions, which are uncommon compared to other paleo-reconstruction efforts on multimillennial timescales. Central to our assumptions is that taking account of the uncertainty in our underlying data is indispensable in analogue approaches for paleoclimatology and, particularly, if one uses spatially and temporally sparse as well as data- and age-uncertain proxies. There is one prime motivation behind our specific handling of uncertainty in terms of tolerance ranges and our selection of reconstruction dates: the analogue search for a chosen date should use as much information about this date as possible, including the uncertainty of other data points whose age uncertainties include the currently given date of interest.

This leads to the use of tolerance ellipses. Assumptions here are that, firstly, data and date are inseparable; secondly, this assumption also holds for the tuple and its two-dimensional uncertainty; and, thirdly, a reconstruction exercise has to consider both parts of the uncertainty to sufficiently estimate the range of reconstructed values. Admittedly, our procedure is a simplified approach to incorporating these assumptions. More correctly, one would calculate the multivariate joint distribution and use a measure of likelihood to select the analogues. As a side note, the highly dimensional space for all proxies also follows a multivariate distribution, which one could then employ in more sophisticated data-science approaches.

We trust that considering both parts of the uncertainty enables better
and more reliable reconstruction estimates. We concede that this
procedure may exaggerate the range of potential climates and thereby may
reduce the precision of the reconstruction

With respect to the lacking precision of the reconstructions,

Our handling of uncertainty in terms of tolerance results in difficulties in implementing a distance measure like the Euclidean. A more formal definition of similarity should take into account the multivariate and correlated nature of uncertainty: in time and across proxies.

Our choice of elliptic tolerance regions may seem counterintuitive. Mainly, two related arguments are imaginable. First, the idea can be proposed that time and data are independent and a uniform rectangular selection criteria could be suggested. We address this already in the description of the method. Here, we concentrate on another argument. Following this second argument, our uncertainty about the value should not shrink at the border of our temporal uncertainty range but should become wider there, as we are less confident that the data value even is valid there. This also assumes an independence of dating and data and their uncertainties. However, our argument for the ellipse is the following. We regard our time-data point as sampled from a two-dimensional distribution. If we regard this to be a uniform distribution, we would also use a rectangular tolerance area. However, we regard the distribution as a two-dimensional Gaussian, which can be visualized as an ellipse in the two-dimensional plain. Thereby, the probability density for a valid point is reduced further away from the best estimate. If our analogue pool would well sample the climate space, we could weigh our time-data points by their likelihood within the two-dimensional Gaussian plain. Then values that are far off in either or both dimensions would be given less weight. However, as we have only a rather small candidate pool, we resort to a binary criterion of inclusion and exclusion.

Related to our handling of uncertainty is our approach of reconstructing
data for those years when at least one proxy predictor is dated. This
also may contribute to the wide range of the reconstructions by
neglecting information in between these dates. Alternatively, one could
pool the proxy dates into constant intervals of, for example, 100 years.
The underlying assumptions here are as strong as those in our procedure. We
note that

Additional assumptions relate to characteristics of the considered proxy
predictors. This includes our decision to generally compare the proxy
predictors to centennial averages of the simulation output. Thereby, we
do not allow for the fact that the proxy sensor might record
extreme-like events. Similarly, we also do not consider the differing
resolutions for each date and each location. Further, we compare the
proxy predictors and the simulation pool in terms of temperatures
instead of using surrogate proxies in proxy units from the simulation
pool. Finally, the use of temperature for the surface and for an
attributed and calibrated season does not account for the sensor-specific habitats and seasonal sensitivities or their changes

Possible improvements of the method would respect more explicitly the irregular resolution of the proxy records and the different resolutions between the records. Similarly, applications benefit if we can discriminate whether a proxy sensor records mean climatic conditions or extreme-like events. Including the proxy specific habitat and growth season also leads to a more appropriate comparison, as does employing proxy forward models to make the comparison in proxy units.

Better understanding of the proxy systems and availability of the full
simulation output data would allow for analogue searches that are more
specific for each proxy series. It further would enable the use of
locally calibrated process-based forward integrations by proxy system
models. The advent of proxy system forward models in principle allows
the production of proxy parameter representations in the virtual environment of
the simulations

It is generally advisable to use consistent proxy parameters, a
consistent recalibration, and a consistent calibration target. This
should increase the probability of the proxy predictors constraining the
pool of potential analogues (compare the results in Sect.

Our reconstruction is only for the approximate domain of the proxy predictors. However, it may be possible that a set of proxy predictors from, for example, Europe also provides information on larger-scale climate variables. Further, we deal only with temperature reconstructions. However, climate is more than simply temperature. Indeed, if there is evidence that the proxy predictors are relevant constraints on other climate fields beyond, in this example, temperature, the pool of analogues can provide information on other climate variables.

However, reconstructing other variables for hydrology or climate
dynamics depends on a sufficient number of proxy records that reliably
represent these. That is, there are two conditions on the proxy records:
they have to represent the variable and there has to be enough of them.
In addition, we have to be confident that the simulation pool reliably
represents the climate variable and its spatial distribution.
Considering the number of available reliable proxies for, e.g.,
precipitation and the quality of simulations' representation of it, we
would expect that reconstruction success using the analogue method may
be worse for these other variables than for temperature

Regarding the temporal resolution, a test of our method suggests that, for a given assumed tolerance level, the analogue search is more successful in finding valid analogues if we consider higher-resolution data and less successful if we reduce the resolution of the data. That is, the method performs slightly better in finding valid analogues when we use 51-year averaged simulation data than when we use 101-year averaged data, and it is even more successful in finding valid analogues using interannual data. While such an interannual analogue search may misinterpret what the proxy data represent, it may be a more truthful comparison considering the potential level of environmental noise in the proxy data relative to the targeted temperature signal.

Similarly, we find more valid analogues if we use less stringent criteria in our search for valid analogues. A single best analogue reconstruction also gives a more continuous reconstruction.

However, all approaches have in common that reconstruction medians as well as reconstruction ranges are relatively constant over time. The reconstructions show little variability and are lacking clear differences in climate between the late and early Holocene.

A likely reason for the small variability in central estimates and the generally rather constant character of our reconstructions could be that the space of valid analogues is too unconstrained and the method labels too many candidates as valid analogues. However, also the single best approach shows such a behavior. That is, while the reconstruction is undoubtedly only weakly constrained, even the best analogues differ little between subsequent dates. Part of this may be due to our choice to consider a rather large temporal range of influence of individual dated records. Our ellipses of tolerance may result in a strong influence of an unlikely value at a specific date. This could potentially be solved by considering explicitly the likelihood of a value at a date instead of simply taking a binary criterion. A less complex solution could be obtained by pooling proxy values in temporal windows, weighting them within these windows, and then performing a reconstruction considering specific ranges of tolerance.

Our aim here is to use the local proxy uncertainty to select analogues.
There is a trade-off between considering the uncertainty of the proxies
and constraining the number of analogues. That is, if we want to
consider the uncertainty in the way we do, then we allow for weakly
constrained analogue ranges. If we allow different levels of proxy
uncertainty, we can choose only the best

Beyond these methodological aspects, the size and character of the pool of analogue candidates influence the quality of the results. Indeed, the lacking sensitivity to differences in climate and the lacking variability in our results may be a sign of an insufficient pool size or an insufficient overlap between simulated climate and the environmental conditions described by the proxy records.

Our results suggest that a pool including the mid-Holocene, Last Glacial Maximum, and transient deglacial simulations does not ensure finding valid analogues for the time period of the deglaciation and the Holocene. An insufficient large pool of candidate analogues requires more tolerant assumptions on uncertainty to obtain valid analogues. Thereby, the analogues remain unconstrained. A small pool also allows for non-uniqueness of analogues. Additionally, climatological inconsistencies become more likely if the range of simulated periods in the model pool is wide.

We do not use anomalies. If there was a large ensemble of simulations over our period of interest, the use of anomalies would be advisable. Similarly, if all proxy records had common modern age data, there might be a valid anomaly building process. However, we include simulations for time slices with notable different climatologies, and proxy records begin at various modern dates. One solution could be a sliding climatology for the proxies, which is added again for the final reconstruction. We note that, if we want to apply proxy forward models based on the calibration between measured property and temperature, we do not use anomalies either because calibration relations frequently need temperature on either the Celsius or Kelvin scales.

This section outlined a number of potential improvements of the
approach. Some of these would increase the number of necessary
computations. While the increase in costs is not prohibitive, we decided
against including such procedures here. However, it appears particularly
worthwhile to try to implement a workflow that combines feasible data-science methods, some version of simple data assimilation, and a proxy
system model framework like PRYSM

The analogue method is a computationally cheap data assimilation approach. Here, we discuss a specific application for time-uncertain, sparse, and irregularly sampled proxies. We focus on the North Atlantic sector and the time period from approximately 15 kyr BP to the late 20th century.

The approach succeeds in providing reconstructions in a pseudo-proxy setup for some past dates. Already, this setup highlights two potential problems. The method may either fail to find valid analogues or provide a wide range of potential analogues which do not necessarily include a target climate. These problems relate to assumptions on the uncertainty in the proxy input data.

The approach performs comparably for realistic proxy setups. However, then, the analogue search often fails to find valid analogues as none of our candidate fields comply with our criteria for a valid analogue. That is, the method fails to provide a climate reconstruction because of a lack of valid analogues. In the present case, this particularly occurs over the late deglaciation and early Holocene.

Furthermore, our reconstructions by analogue are generally rather imprecise for the used proxies and a limited pool of simulation data. The range of potential analogue values can become very wide for a given date. Regional average reconstruction medians show little variation over time.

The analogue method is non-linear and considers the spatial covariances between the proxy records. While it lacks precision in our setup, it nevertheless provides us with spatial field estimates of past climate states that are consistent with the regional inter-relations as presented by the proxy predictors.

Additional information for the used proxy records: proxy ID, main reference, and reference for the datasets. For additional information, see Table

Table

Table

Additional information about the pool of simulation data: model name, main reference, and link to the provider of the data. For additional information, see Table

Table

Information on individual simulations: model, simulation, and period.

Continued.

We provide lists of valid analogues per date and
experiment at

The proxy data we use are available from the Supplement of

OB designed and conducted the study and was the main author. Both authors discussed the methods, the results, and their implications.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Paleoclimate data synthesis and analysis of associated uncertainty (BG/CP/ESSD inter-journal SI)”. It is not associated with a conference.

Funding for this research is by the German Federal Ministry of Education
and Research (BMBF) within the Research for Sustainability initiative
(FONA;

This research has been supported by the Bundesministerium für Bildung und Forschung (grant nos. 01LP1509A and 01LP1926B).The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association.

This paper was edited by Lukas Jonkers and reviewed by two anonymous referees.