How precipitation intermittency sets an optimal sampling distance for temperature reconstructions from Antarctic ice cores

Many palaeoclimate proxies share one challenging property: they are not only driven by the climatic variable of interest, e.g. temperature, but they are also influenced by secondary effects which cause, among other things, increased variability, frequently termed noise. Noise in individual proxy records can be reduced by averaging the records, but the effectiveness of this approach depends on the correlation of the noise between the records and therefore on the spatial scales of the noise-generating processes. Here, we review and apply this concept in the context of Antarctic icecore isotope records to determine which core locations are best suited to reconstruct localto regional-scale temperatures. Using data from a past-millennium climate model simulation equipped with stable isotope diagnostics we intriguingly find that even for a local temperature reconstruction the optimal sampling strategy is to combine a local ice core with a more distant core ∼ 500–1000 km away. A similarly large distance between cores is also optimal for reconstructions that average more than two isotope records. We show that these findings result from the interplay of the two spatial scales of the correlation structures associated with the temperature field and with the noise generated by precipitation intermittency. Our study helps to maximize the usability of existing Antarctic ice cores and to optimally plan future drilling campaigns. It also broadens our knowledge of the processes that shape the isotopic record and their typical correlation scales. Finally, many palaeoclimate reconstruction efforts face the similar challenge of spatially correlated noise, and our presented method could directly assist further studies in also determining optimal sampling strategies for these problems.

. Illustration of the decorrelation lengths in the conceptual model. Shown are as a function of distance the correlation between two temperature time series (black), between a temperature and a precipitation-weighted temperature time series (dashed black), between the intermittency noise (violet), between two precipitationweighted temperature time series (green), and between a target temperature time series and the average of two precipitation-weighted temperature time series (orange), one located at the target site and one located a certain distance away from the target site as indicated on the x axis. Parameters are taken from the DML region.
Regarding the relative importance of the different noise sources: Previous studies have shown for the EDML site in East Antarctica that stratigraphic noise amounts to approximately up to 50 % of the total isotope variance at the seasonal time scale (Münch et al., 2016), however, quantitative estimates for other Antarctic regions are still missing. A similarly high relative contribution is expected from precipitation intermittency (Laepple et al., 2018), which probably has a larger impact further inland than compared to coastal regions (Casado et al., 2020). We will add a short discussion of these results to the introduction. I found some of the discussions of study regions and target regions quite confusing and could do with some clearer explanations. I also thought a motivating example with actual ice core data at one of the target sites to demonstrate the reduction in noise would be very helpful, and would be useful in quantifying how much improvement in the temperature reconstruction is obtained with additional cores when all the other confounding factors are included in the isotope signal. They state that including these sources of noise, eg isotope diffusion is outside the scope of this study, but and I would like to see how the results hold up when the real data is included.
We are sorry that some of our explanations regarding study regions and target sites were not clear enough and we will improve the text as illustrated in our answers to the specific comments.
We agree that testing our results on real ice core data would be an ultimate goal. Such a test would include to find appropriate ice core data and suitable instrumental temperature records, which are sparse on Antarctica. We thus think that this is clearly a study on its own and much beyond the scope of this manuscript, also given the manuscript's current length and number of figures.
The authors are clear in the assumptions going in to the conceptual model and the analysis overall. They are also clear on the limitations of trying to use this optimal sampling correlation structure in the real world.
Overall, the manuscript is written clearly, and is of good quality and is of scientific interest, especially to the ice coring an palaeoclimate community. I have listed a few specific comments below that I believe would help improve the readability of the paper and recommend publication once these issues are addressed.
Thank you. We are happy about this positive evaluation and are confident that addressing the specific comments will further improve the manuscript.

Specific Comments:
Lines 13-14: Remove final sentence as I didn't see the application of this technique to anything other than temperature reconstructions from ice cores discussed.
You are correct that in the paper our technique for assessing the optimal sampling strategy is only applied to temperature reconstructions from ice cores. Therefore, this final sentence of the abstract was meant to be an outlook to emphasise that our technique is however general enough to be extended to other palaeoclimate (temperature) proxies that face similar problems, i.e. noise sources working on different spatial scales. In addition, we do pick up this topic in the final part of the Conclusions (LL 353-357).
Unless the editor explicitly disagrees, we would keep this sentence/part in the manuscript as a reference for the broader palaeoclimate community, but we suggest to rephrase the mentioned sentence to more clearly articulate that it is being meant as an outlook for further studies: [...] It also broadens our knowledge on the processes that shape the isotopic record and their typical correlation scales. Finally, many palaeoclimate reconstruction efforts face the similar challenge of spatially correlated noise, and our presented method could directly assist further studies in determining optimal sampling strategies also for these problems.

And in the Conclusions (L355 et seq.):
This likely applies to various marine as well as terrestrial proxy types, and our strategy thus might offer a step forward in the best use of sampling and measurement capacity for quantitative climate reconstructions, which needs to be investigated in further studies. section 2.3.3: Some motivation and justification for the sampling scheme of rings and selection would be helpful. It was also not clear at times if N was referring to the number of grid cells, or rings and also is presumably different to N grid mentioned later on line 132 onwards.
We chose the sampling scheme of rings since it provides a computationally efficient way to estimate a statistically solid average correlation as a function of distance between the averaged locations; see also our more detailed answer on the similar issue raised by the second reviewer.
In general, in the manuscript the symbol N (with or without subscript) always refers to a particular number of model grid cells, but we agree that our use of this terminology may have been unclear at times, since there are different contexts in which we use this symbol. To make it clearer for the reader, we suggest to adjust the terminology as follows: • In section 2.1 (LL82-84), the number N grid = 442 refers to the total number of grid cells which lie, in our specific model simulation, on the Antarctic continent and which are used for all our analyses (since, obviously, you cannot drill ice cores on marine sites). Since this number is not referred to at any later stage in the manuscript, we suggest to drop the term N grid here in L84: [...] extracted from the total number of 442 model grid cells that are available for the Antarctic continent (Münch and Werner, 2020). • In section 2.3.4 (LL129-139), the number N grid refers to the number of grid cells which lie within our defined DML and Vostok study regions and which are used as temperature target sites. To improve clarity, we will, similar to above, refer to the number of grid cells within the two study regions explicitly but without using the symbol N grid , and will rephrase the respective text passages as follows: The correlation values are all highly significant; please see our answer below to your remark on Fig. 3. We agree and will use a sequential color scheme (red hue) for Fig. 2 and for the correlation map figures (Figs. 3, 4, 6, A1, and A2). We will add the map showing the differences in correlation (Fig. 2 below) to the manuscript as part of manuscript Fig. 3. This map nicely illustrates that the correlations tend to remain unaffected mostly in the coastal regions of Antarctica, where precipitation intermittency is expected to be less important (Casado et al., 2020).
However, indicating the significance of the correlation coefficients in the maps does not make sense statistically, since the correlation values are all highly significant (p <<< 0.01) given the long time series (1200 data points). Even if one accounted for autocorrelation of the data, the significance should still remain very high across all grid cells.   We agree that our explanation might be be ambiguous or misleading here. Of course, we first average the N isotope time series, taken from N grid cells, and then compute the correlation of this averaged record with the target site temperature time series. We iterate this approach over all possible combinations (or over a finite number of Monte Carlo combinations) of drawing N grid cells from a given ring combination, and only then average the correlations from these iterations to obtain the mean correlation for this ring combination. We then analyse the next ring combination. This approach is in more detail explained in Sect. 2.3.4. We will revise the text in Sect. 3.3. to better and unambiguously summarise this approach. In addition, we will take care that the methods text in Sect. 2.3.4 is also clearly understandable.
I would also like more comment on how much of the regional difference is lost by averaging to only get a function of radial distance. As the DML and Vostok results differ and you suggest there are regional differences. Surely the correlations are not radially uniform?
We agree that the correlations do not need to be radially uniform. From physical arguments we expect, however, that the first-order spatial correlation patterns are largely invariant against rotation. This is indeed the case for the temperature field, as the correlation maps show for the EDML and Vostok sites ( Fig. 3 below), and similar patterns are observed for other Antarctic regions. However, for δ 18 O, and also partly through the effect of precipitation weighting, indeed rather strong radial asymmetry can occur. Nevertheless, still a contribution from radially symmetric patterns may exist, and our approach of a radial averaging is based on assuming such symmetric contributions. This can be motivated by the fact that in real world applications one may not necessarily know in which direction the correlation pattern is maximal, so that radial symmetry is the most straightforward assumption. And indeed our results suggest that a gain in correlation can be achieved nevertheless, when we use optimal core locations based on an analysis assuming radial symmetry, despite the actual form of the spatial correlation patterns. a.
b. At this point, we think it would be most beneficial for the reader to include the presentation of Fig. 3 in the manuscript (e.g. in a new subsection after 3.1) and to use it as a starting point to motivate the ring sampling in section 3.3 as well as for the discussion (Sect. 4.1). However, this would of course extend the manuscript substantially and it would also increase the number of figures, and we thus would leave it to the editor to decide whether this extension is reasonable. It would be also useful to have the concentric circles marked around the target sites. Can you also comment on why the locations shift when more cores locations are added? That is, is the location in the N=1 case also included in the N=3 case as it looks as though they have moved slightly. These segments don't seem to correspond to the regions mapped out by the black polygons in Fig. 1 so it is not clear to me what the study regions actually are. Does it mean that the cores in the N>1 cases can be from outside of those black polygons too as appears to be the case here?
We use the study regions (the black polygons in Fig. 1) only to define regions from which we select temperature target sites, with respect to which we conduct our analyses and across which the results are then averaged to obtain regional estimates, such as for Figs. 5, 6, and A1. Here, for Fig. 4, we use only a single temperature target site from within the study regions, namely the EDML site (Fig. 4a-c) and the Vostok site ( Fig. 4d-f). The δ 18 O (pw) model grid cells that we average and correlate with the target temperature time series can, however, indeed be selected from a wider region than the study region, namely from the 2000-km circles around the target sites (as explained in Sect. 2.3.2).
We will add the line of the 2000-km circles around the target site to the plots and explain in the figure caption that the δ 18 O (pw) grid cells ("ice cores") can be chosen from within these circles.
Yes, you correctly observe that the optimal location (model grid cell) in the N=1 case for EDML is no longer included in the N=3 or N=5 case, while this is the case for Vostok. However, this is simply by chance: while for N=1 it is computationally easy and fast to find the best correlating grid cell within the 2000-km circles and so panels (a) and (d) display the "true" optimal location, for N=3 and N=5 we need to randomly select and average grid cells, using 10 5 iterations. The displayed locations is the best configuration of these iterations, but does not necessarily need to be the "true" optimal configuration. However, in terms of correlation value with the target site temperature, the value from the best iteration should be very close to the correlation value for the true optimal configuration due to the large number of performed iterations. Fig 5. The black dashed line indicating the exponential fit is not included in the figure legend, only the caption. The dots in the plot don't appear to be at the expected 0, 250, 500km marks, but are a bit offset, is this deliberate? Again, I question the averaging step around the whole 250km rings, but it would be nice to see the spread in the correlation as a function of distance too. I am confused at what is meant by "all respective target sites in the DML and Vostok region", how are there more than one target sites for each region, are these different that the two crosses shown in Fig. 4?
We will add the explanation for the dashed line to the figure legend and include the correlation scatter (standard deviation of the correlation results across the different target sites within the study regions); see the revised plot in Fig. 4 below. Yes, the dots in the plots are deliberately placed in the middle of the ring bin borders, i.e. at distances of 125, 375, 625, ... km, since each correlation value is an average value across a ring bin (from 0-250, 250-500, 500-750, ... km) by averaging the individual correlations obtained between the target site in the centre and all grid cells that lie within each bin.
Regarding the target sites your comment shows that there is a clear need for us to better explain this concept in the revised manuscript. We use "target site" as the term to denote a model grid cell from which we use the temperature time series (T 2m ) to correlate all other grid cells and variables with, and which defines the centre of the ring bins. For a single target site we can study the average correlation with distance similar as shown in Fig. 5. But Fig. 5 involves a second averaging step: To improve statistics, and to obtain regional estimates, we use all model grid cells within our defined DML and Vostok regions as target sites, one after the other, to get the correlation-distance dependencies for each target site and for each variable (T 2m , δ 18 O, δ 18 O (pw) ). Then we average all these 26 (DML region) and 30 (Vostok region) curves that we obtain for each variable, respectively, to produce the curves shown in Fig. 5. The same approach is used for the results shown in Figs. 6 and A1.
We will expand the explanations in sections 2.3.1 and 2.3.4 to clarify the concept of "target sites" and of "the averaging across target sites" to the reader. We fully agree that the difference between the DML and Vostok regions is an interesting result. When writing up the manuscript, we were concerned that we could overload the results section with too many coloured contour plots, which is why we moved the Vostok plots to the appendix. However, we are happy to combine both regions into a single Figure 6 (similar to Fig. B1) and remove Appendix A. Distance from target site (km) The way this figure is arranged, is the top row, marked rank 5, that which has the max correlation, or is the rank 1 row the highest correlation row? I find it very curious that in the Vostok region the optimal arrangement comes with no local sites in many of the cases The maximum correlation correponds to the combination/row that is marked as rank 1; we will add a clarifying remark to the figure caption.
Yes, it is indeed a curious result that in the Vostok region the optimal arrangement comes mostly without local sites, but we see no evidence for not trusting this result in view of the agreement between N = 3 and N = 5 and given the statistics from the large number of sampled locations in our ring sampling scheme.
line 220: Suggests that regional differences play a part here and I would like a comment on what those differences are (elevation, distance from coast etc?) We can only speculate what possible reasons could there be for the difference in temperature-isotope correlation between the DML and Vostok regions (see Fig. 5 in the manuscript). One factor might indeed be the larger elevation and distance from the coast, i.e. a stronger continentality at Vostok, and related to this, differences in the distillation paths of the transported vapour.
Line 232: Does the averaging have an effect on the significance of the correlations?
As pointed out above, there is statistically no sense in studying the significance of the correlation coefficients given that each time series has such a large number of data points. Fig. 8 (b) I assume the colours on the histogram are the same as in (a), but please add the legend anyway. Can you comment on the low correlation outliers for the EDML case.
Yes, the colours are the same in (b) as in panel (a); nevertheless, we will add a second legend to panel (b). The low correlation outliers in the EDML case stem from the grid cell combinations which include one anomalous grid cell located at ~ 72.4 °S, 22.5 °E; see Fig. 3 and our reply to a respective comment by reviewer #2. We will have a deeper look into this issue.
Section 4.1 I found most of the discussion very clear and thorough, but re-iterate that a good schematic diagram to illustrate the length scales in the conceptual model would be very useful.
Thank you; please see our reply to your General Comments and Fig. 1 there.