Quantitative climate reconstruction from sedimentary ancient DNA: framework, validation and application

Herzschuh, Ulrike; Böhmer, Thomas; Jia, Weihan; Lisovski, Simeon

doi:10.5194/cp-22-1159-2026

Articles | Volume 22, issue 6

https://doi.org/10.5194/cp-22-1159-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/cp-22-1159-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 22, issue 6

Research article

| Highlight paper

|

10 Jun 2026

Research article | Highlight paper |

| 10 Jun 2026

Quantitative climate reconstruction from sedimentary ancient DNA: framework, validation and application

Ulrike Herzschuh, Thomas Böhmer, Weihan Jia, and Simeon Lisovski

Download

Final revised paper (published on 10 Jun 2026)
Preprint (discussion started on 24 Jun 2025)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-2678', Anonymous Referee #1, 23 Jul 2025
Pollen preserved is lake sediments have been used to generate quantitative reconstructions of past climate for the last 50 years, but two inherent problems severely limit the precision possible. First, species with ecologically distinct niches can be difficult or impossible to separate palynologically, secondly, some wind dispersed species produce vast amounts of well dispersed pollen that can blow into lakes far beyond the ecological limits of the species. This manuscript tests the potential of ancient DNA, which has high taxonomic resolution and limited dispersion, as an alternative to pollen for reconstructions. The manuscript tests different reconstruction methods, including traditional transfer functions that use a calibration set and an alternative method that uses presence-only data.

The methods used are:

CREST based on GBIF occurrence data

CREST based on GBIF occurrence data with MaxEnt preprocessing

WAPLS using a calibration set

MAT using a calibration set.

The first two methods are just applied to the aDNA data, the latter two are applied to both aDNA and pollen data. This may be the first time a head-to-head comparison of CREST and transfer functions has been done. It might be worth extending the analysis to run CREST with the pollen data as well for a more complete comparison.

The justification for using CREST with MaxEnt preprocessing is "that several taxa do not equally cover the temperature range". This is a rather vague justification, and I'm not exactly sure what is meant by it. The GBIF-MaxEnt-CREST pipeline is novel and rather involved. I recommend starting this section with a short paragraph that outlines the process so that the details are easier to follow.

Pre-processing with MaxEnt seems to improve the performance of CREST, but I do not understanding what exactly how MaxEnt helps CREST perform better. I'm dubious of the claim that it "enhance[s] the point density in the occurrence data" as it is not possible for the method to create data. Maybe a plot comparing the niches estimated by both methods would help explain what is happening.

I can easily imagine that the CREST is too constrained in the niche shape they can fit, and it might be profitable to allow more than normal or log-normal PDFs. So rather than pre-processing the data with MaxEnt, the first step of CREST is replaced by MaxEnt (or another flexible model). Of course the penalty for using more flexible models is that they are prone to over-fitting.

There is an issue with the cross-validation of the transfer function models. The ms reports that the uncertainties on the reconstruction are calculated using bootstrapping, but it is unclear what cross-validation scheme is use to estimate the models' performance.

One widely used cross-validation scheme is leave-one-out cross-validation. Somewhat confusingly, this ms, following Chevalier (2022), uses the term leave-one-out to refer to a type of sensitivity analysis in which taxa are left out sequentially. It would be better to call this step a sensitivity analysis.

The ms emphasises the median bias of the reconstructed temperatures as a metric of transfer function performance. I have not seen this metric used before. Mean bias is sometimes reported, but not prominently as it can be zero even if the transfer function has no skill. Median bias will have the same unfortunate property. Maximum bias is more useful.

The pollen-MAT error for the Billyakh core top is very low. Is this lake part of the calibration set? If so, such a low error is not surprising, and it might be worth removing before reconstructing the coretop it to get an unbiased estimate. The same applies to the WAPLS model, but the effect will be much weaker and could probably be ignored (unless the assemblage is distinctive), but treat the two transfer function methods the same way.

If Billyakh is one of the calibration set sites, it could be marked on the panels on the left of figure 2.

How was it decided to use seven analogues in the MAT models? (seven should be written in words, as should any other small integers).

I don't know if it needs to be stated in the ms, but it may be worth reminding readers that there is a risk of circular reasoning when interpreting assemblage changes due to climate when that assemblage has been used to reconstruct the climate.
Citation: https://doi.org/10.5194/egusphere-2025-2678-RC1
- AC1: 'Reply on RC1', Thomas Böhmer, 17 Sep 2025
  
  Dear reviewer, thank you very much for your comments! We address all of your comments in the attached Supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2678-AC1
RC2:
'Comment on egusphere-2025-2678', Charline Giguet-Covex, 18 Aug 2025
Review of "Quantitative climate reconstruction from sedimentary ancient DNA: framework, validation and application"
This manuscript aims to reconstruct past summer temperatures in Siberia over the last 32,000 years. Four approaches are developed based on modern datasets (observations and fossil records) and applied to a sediment archive (a core from Lake Billyakh). Each method entails specific benefits and drawbacks, which makes the approaches highly complementary and strengthens the robustness and quality of the study. In particular, the approaches combining “raw” GBIF data or species distribution models derived from these data (a method mostly used in ecology) with taxon-specific probability density functions to link taxa to climatic conditions are very original in palaeoenvironmental contexts.
It is also noteworthy that using plant sedaDNA communities in combination with several statistical approaches significantly improved predictive accuracy (see median bias and root-mean-squared error of prediction), compared to other proxies (e.g. pollen, chironomids). Another strength of the study is the availability of a large modern surface-sample dataset (203 sites), which was used for training (WA-PLS, MAT) and for independent validation of the methods (GBIF-SPD or SDM-SPD).
My main concern relates to the impact on reconstructions that incorporate the SPDs approach of the relative contributions of different taxa estimated by sedaDNA, which most likely do not reflect their actual contributions in the landscape. Since this information is used to weight the SPDs, I think the potential impact should be discussed. This could be addressed in section 4.3. At present, the only mention is in the outlook section: “Furthermore, integrating absolute quantification methods (e.g. DNA-based biomass estimates) could reduce PCR-related biases, enhancing proxy accuracy (Ushio et al., 2018).”
In addition, in my view, the manuscript would benefit from a section addressing other climate proxies beyond vegetation assemblage reconstructions (e.g. BrGDGTs, chironomids, δ18O from diatoms or ostracods).

Specific comments
L27:I would add “in appropriate contexts, i.e. where vegetation composition is primarily driven by climatic conditions.”

L50:Perhaps mention topography as well?

L285–286:The sentence is awkward: “However, some taxa occur with high abundance that occur today under warm conditions including Crepinidae_01 and Asteraceae_03.” → Suggested: “However, some highly abundant taxa also occur today under warm conditions (e.g. Crepinidae_01 and Asteraceae_03).”

L294:For “outlier taxa”, please provide a definition: e.g. taxa with warm ecological preferences found during glacial periods, or the opposite.

L339:The word “quantitatively” seems unnecessary — temperature is inherently quantitative.

L349–350:Specify that you are referring to chironomid head capsules, not chironomid DNA.

L359–361:Runoff refers only to the flow of water over the land surface (rain, snowmelt). Strictly speaking, this does not necessarily include erosion, which is the key process in Giguet-Covex et al. (2019). Suggested formulation:

“SedaDNA mainly originates from the lake’s immediate catchment, since erosion of the surrounding slopes is the dominant transport process.”

L375:With Courtin et al., 2021, you may also cite Giguet-Covex et al., 2019.

L380: I would precise "...of direct and indirect climate drivers..."

L391:Figure 6 is not placed correctly in the sentence.

L396–401:The explanation could be clarified. Current text:“The plant metabarcoding assemblages from Lake Billyakh used for reconstruction are substantially more diverse than the pollen data (73 vs. 41 taxa), likely contributing to a more stable reconstruction. In contrast, the pollen-based reconstruction displays greater variability (Fig. 4), possibly due to a lower signal-to-noise ratio stemming from fewer taxa contributing to the reconstruction. This interpretation aligns with findings by Heiri and Lotter (2010), who demonstrated that lower taxonomic resolution in chironomid-based temperature reconstructions decreases sensitivity in detecting subtle climate variations.”

➡ Suggestion: Make the link between diversity and stability explicit. For example:

“A higher number of taxa increases the likelihood of detecting species with narrow ecological niches, which may strengthen the climate signal and improve the signal-to-noise ratio. This mechanism, rather than diversity per se, may explain the stability of the sedaDNA-based reconstruction.” Also, clarify the distinction between diversity and taxonomic resolution to avoid confusion (you first speak about diversity and at the end about taxonomic resolution).

L407:I assume you mean Figure 4.

L423:Please provide the RMSEP value for the plantDNA_PDF/SDM approach.

L425–427:Clarify whether you are referring to all approaches or only the PDF-based ones. The results for WA-PLS and MAT in Fig. 2 seem less clear.

Figure improvements
Fig. 1: Add a blue border to the text boxes associated with sedaDNA (field sampling, laboratory analyses, bioinformatics). In the bioinformatics part, remove animals (since only plant data were used here).

Fig. 2: The ΔT for Billyakh is not shown — could you add its color code? Also, include the number of modern lake sediment sites used.

Fig. 4: Highlight (e.g. in bold) taxa detected in both periods.
Citation: https://doi.org/10.5194/egusphere-2025-2678-RC2
- RC3:
  'Reply on RC2', Charline Giguet-Covex, 18 Aug 2025
  
  Just one more question that I forgot to include in the review: for the WA-PLS you perform 2 transformations (Hellinger- and then square-root transformed) and for the MAT you only do the Hellinger transformation). Can you explain why?
  
  Citation: https://doi.org/10.5194/egusphere-2025-2678-RC3
  - AC3: 'Reply on RC3', Thomas Böhmer, 17 Sep 2025
    
    Actually, it is similar between the two approaches, i.e. we used Hellinger- and square-root transformation. Because in MAT the additional square-root transformation is already included in the distance metrics sq.chord (rioja R package) which we used for implementing the reconstruction.
    
    Citation: https://doi.org/10.5194/egusphere-2025-2678-AC3
- AC2: 'Reply on RC2', Thomas Böhmer, 17 Sep 2025
  
  Dear Charline, thank you very much for your comments! We address all of your comments in the attached Supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2678-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to minor revisions (review by editor) (30 Sep 2025) by Odile Peyron

AR by Thomas Böhmer on behalf of the Authors (10 Oct 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (14 Oct 2025) by Odile Peyron

AR by Thomas Böhmer on behalf of the Authors (22 Dec 2025) Author's response Manuscript

Editorial statement

The manuscript by Herzschuh et al. presents a novel approach for deriving quantitative summer temperature estimates from Lake sediments. This method has the potential to substantially improve terrestrial temperature reconstructions and thereby advance our understanding of past continental climate change.