Comment on cp-2021-151 Anonymous Referee # 3 Referee comment on " Plio-Pleistocene Perth Basin water temperatures and Leeuwin Current dynamics ( Indian Ocean ) derived from oxygen and clumped isotope paleothermometry

Over the past week, I have read the manuscript by De Vleeschouwer et al., submitted for review to Climate of the Past, with interest. In their manuscript, the authors present new stable oxygen isotope and clumped isotope data from Pliocene to Pleistocene planktic foraminifera microfossils from the IODP core drilled at Site U1459 off the western coast of Australia. This site is interesting, because it is placed within a framework of other IODP records in the region and by comparison potentially allows the strength of the Leeuwin current to be constrained over this important time period, which covers the transition from the comparatively warm Pliocene to the Pleistocene icehouse climate.

The addition of clumped isotope analyses to the (partly previously established) stable oxygen isotope records is useful as it potentially allows the effects of temperature and seawater isotopic composition on the stable oxygen isotope records to be disentangled. This is especially important in this setting, since one of the main aims the authors put forward is to reconstruct the isotopic gradient along the Australian western margin as a proxy for the strength of the Leeuwin current.
Overall, I think the authors present a valuable dataset embedded in the context of ongoing research in this area. The subject of the study definitely fits within the scope of Climate of the Past. However, I do think that some revisions are required to the manuscript before it can be accepted for publication, as I believe some of the conclusions the authors put forward are not (yet) fully supported by the data as it now stands. Below, I highlight some major concerns I have with the discussion of the data, followed by some more minor line-by-line changes I suggest the authors implement to improve the readability of their text.

TEX 86 vs. clumped temperatures
Firstly, I have some concerns about the authors' conclusions that TEX 86 likely reflects sea surface temperatures while the clumped isotope analyses are indicative of the temperatures in the lower mixed layer, as put forward in the abstract (lines 24-26). In their discussion about the discrepancy between TEX 86 and clumped temperatures, the authors put forward the hypothesis that TEX 86 may be seasonally biased (lines 343-350). In fact, previous studies have suggested that a summer bias on TEX 86 is likely and explains the consistent difference between TEX 86 and stable isotope paleotemperature estimates found in other studies (Jia et al., 2017;O'Brien et al., 2017). While they disregard their other working hypothesis about the southward displacement of sinking particles, it seems that the authors cannot exclude the possibility of seasonal bias in their TEX 86 data (see line 387), which could easily explain the ~5°C difference between TEX 86 and clumped results (line 283) if TEX 86 represents summer SST and clumped records MAT (see Fig. 1). In absence of clear evidence about the living habitat of T. sacculifer (lines 371-385), and with the only other line of evidence being that the TEX 86 temperatures "seem reasonable compared to the present-day mean annual temperatures" (lines 341-342), I think the conclusion that clumped isotope temperatures represent the lower mixed layer and TEX 86 represents the mean annual SST is not sufficiently supported.

Isotopic gradients
While I understand that the authors only measured clumped isotope temperatures in one site, it is a shame that their discussion of the isotopic gradient along the Leeuwin Current does not benefit from the addition of clumped isotope analyses. My very first question on reading this discussion after the discussion on the clumped isotope results is how much of this isotopic gradient reflects temperature gradient and how much reflects the difference in seawater isotopic composition. Would a strengthening or weakening of the Leeuwin Current affect both these variables similarly? Would it somehow be possible to infer from the changes in temperature and seawater oxygen isotope composition over time, which the authors can infer from their clumped isotope record, whether the changes observed in δ 18 O over time are mostly driven by temperature or water composition? And by extension, could this evidence be used to say something about which factor predominantly forced the changes in isotopic gradient? Finally, if the author's hypothesis that the foraminifera calcify in the lower mixed layer is correct (see previous comment), how does this impact the discussion of isotopic gradient? Can the authors somehow exclude that changes in the calcification depth or the depth of the mixed layer between the two sites which are compared affect the difference in δ 18 O without the need for a change in the strength of the Leeuwin Current? I feel that there is some untapped opportunity for discussion on this topic which would integrate the clumped isotope analyses more firmly into the main discussion of the manuscript.

Clumped isotope statistics
I had some concerns about the way the statistics and uncertainty of the clumped isotope analyses were presented in the manuscript: First of all, the caption of Table 2 (line 309) and the methods description (line 277) list different reproducibility errors for the clumped isotope measurements. I assume that the standard deviations cited in line 277 are one order of magnitude too high (e.g. 0.0314‰ instead of 0.314‰, as in line 309).
Secondly, I noticed that the authors used the reproducibility of their standards for calculating the standard errors in Table 2 (see lines 307-309) instead of the within-sample reproducibility. This method is likely to underestimate the uncertainty on the â 47 values in the samples, as the homogenized ETH-4 standard on which the standard deviation is based will likely reproduce better than the samples consisting of foraminifera pooled from up to four adjacent samples (line 189; up to 60 cm core depth when using the median sampling resolution from line 177). The authors should at least report the reproducibility of clumped isotope analyses within their samples.
Thirdly, in the clumped community it is common practice to report uncertainties at the 95% confidence level (e.g. Fernandez et al., 2017). Instead, the authors report uncertainties at ±1 standard error in Tables 2 and 3. The captions of Figures 4 and 5 do not show what the error bars on the clumped datapoints represent, but from comparison with the tables I infer that these are also 1 SE. This reporting makes the uncertainty look smaller than in other studies using 95% confidence level and in my opinion the reporting of ±1 SE ("68% CL") is less intuitive. I realize that calculating 95% CL, or even the withinsample standard deviation, of samples with 2 or 3 replicates (PB03, PB05, PB06 and PB08) is challenging due to the lack of statistics. This problem illustrates the risk of analyzing small numbers of replicates of samples and will make it challenging to assess the confidence on these clumped isotope datapoints, or to compare the results amongst themselves (e.g. via a Student's T-test) or with other data. I do not know how this issue can be resolved without adding additional replicates, and I do sympathize with the authors given how much work it is to gather enough foraminifera for these measurements. At the very least, I would therefore urge the authors to add information about their withinsample reproducibility (standard deviations) for all samples and calculate 95% confidence levels for those samples for which this is feasible (sample size > 3), in addition to making the clumped isotope results available in an open-access repository (now, only regular stable isotope data is archived).
Finally, while not (yet) a standard in the clumped isotope community, it would be good practice if the uncertainty on the clumped isotope calibration(s) used in the study were to be propagated on the clumped isotope result. This uncertainty is not contained within the measurement uncertainty and is usually relatively small (<5 ppm). However, given the differences between the sample sizes and temperature ranges between the calibrations cited in Table 3, the differences in uncertainties of these calibrations could be discussed.

Recalculated clumped isotope calibration
It is a really nice addition that the authors compare the results of applying difference clumped isotope calibrations on their data (Table 3) Table 1 (as in Meinicke et al., 2021). Providing the calibration dataset is especially important as the uncertainties on the calibration (see previous comment) cannot be propagated from the errors on the slopes and intercepts of the calibration formula alone, as information about the covariation of slope and intercept are missing from this information.

Minor comments
Line 24-26: I am not sure this conclusion about the explanation of the difference between TEX 86 and clumped results is currently supported by the data Line 39: "habitable" seems a bit overstated. Would the continent be wholly unhabitable without the boundary currents?
Line 136-137: Please refer to the repository where the re-calculated calibration dataset of Peral et al. 2018 can be accessed.
Line 195: Rephrase "carbonate, power reacts" to "carbonate powder reacts" Line 215-216: Why was an acid fractionation factor used? The new values of the ETH standards in Bernasconi et al. (2021) should not require the use of a factor if the reaction of the carbonate took place at 70°C. Please double-check if the acid fractionation factor is not wrongly applied as this would offset the temperature reconstructions which can have big implications for the discussion in the manuscript! Line 250-251: I agree that this method of tuning the record runs the risk of circular reasoning, but I think the fact that the authors limit their tuning to only 2 astronomical tie points renders this risk fairly limited.
Line 275-279: "The reported uncertainties scale to the number of repeated measurements" I think this statement is redundant given the fact that the SE are calculated from the reproducibility of the standards, which is the same for all samples except PB03, hence yielding lower SE for samples with higher N. What would be more interesting here is to report the reproducibility of the replicates within the samples (see major comment).
Line 292: I find it confusing that the clumped isotope data are presented with a ±1 SE error while for TEX 86 the full 95% confidence level is reported. This makes comparison between the proxies difficult (see major comment).
Line 300-301: The fact that the δ 18 O-based reconstructions are more similar to clumped than to TEX 86 is not really surprising, since these isotope proxies are measured on the same material (foraminifer carbonate). Table 2: I suggest the authors also provide their uncertainties (1 standard deviation) on the three ETH standards used in the ETF-calibration of the clumped result to give the reader an idea of the reproducibility of the clumped isotope measurements on different standards.
Line 356-359: I agree that the TEX 86 data looks to be of high quality and likely reflects SST. However, I think the authors did not sufficiently disprove the hypothesis that the TEX 86 temperatures may be seasonally baised (see major comment). A seasonal bias in SST can be substantial and I would not label such a potential bias as a "minor warm-bias" (line 357).
Line 451: "This interpretation is endorsed by…" I would rephrase this. A causal relationship between two parameters (as stated in the previous sentence) cannot be (dis)proven by the similarity of the powerspectra of these parameters. Spectral analysis is a powerful and useful statistical tool, but it cannot be used to infer causal relationships.
Line 457: "remarkable co-variation between…" A statement like this should be backed up with statistical proof of this co-variation. For example by an R 2 and p-value and/or by means of cross-spectral analysis. Lines 524-525: Please include the potential for seasonal bias in the TEX 86 reconstructions more prominently here in the conclusion (see major comment). Perhaps the authors could estimate the size of the warm bias if the TEX 86 would record summer temperatures and compare that with the difference between the proxies to show whether or not this bias is small enough to be neglected. Line 535: "Current" rephrase to "Currently" Figure C1: From the caption it is not clear if these pictures are from Gallagher et al. or from this study. Please clarify.