Preface: Advances in paleoclimate data synthesis and analysis of associated uncertainty: towards data–model integration to understand the climate

<jats:p>
                    </jats:p>


Introduction
Paleoclimate data1 can provide insights into climate dynamics across a range of timescales and for climate states that are inaccessible from the instrumental record.Thus, the study of the paleoclimate record can improve our understanding of how the climate system operates.Whilst interest in past climate is valid in its own right, research on past climate change has gained urgency because of the ongoing anthropogenic intervention in the climate system, which has led to changes that are unparalleled over at least a few thousands of years and possibly irreversible for centuries and already negatively affects societies and ecosystems across the world (IPCC, 2021).This is because the paleoclimate record can serve as a baseline of natural variability, provides information on how the climate system responds to changes in the forcing and may serve as an analogue of future climate (e.g.Tierney et al., 2020).
Climate models are in a way a summary of our current knowledge of the climate system.They are essential tools to assess mechanisms, test hypotheses and project future climate trajectories.To apply climate models outside the current climate state, their skill under conditions different from today has to be demonstrated.In this respect, comparing paleoclimate simulations with paleoclimate data is the only means of evaluating and constraining climate models under boundary conditions different from those during the instrumental period.Paleoclimate data-model integration can therefore deliver insights into the climate system that data or models on their own cannot provide (Fig. 1).
Comparing paleoclimate data with simulations appears simple at first sight.However, from a paleoclimate data perspective this process is not straightforward because paleoclimate observations are associated with considerable uncertainty since they are by nature indirect and contain chronological error.In addition, paleoclimate information is irregularly distributed in time and space, and meta-analysis of existing paleoclimate data is non-trivial because of fragmented and non-standardised archiving.Thus, in order to allow meaningful data-model integration, paleoclimate data need to be standardised and synthesised, and the associated uncertainty needs to be constrained.This short article serves as an introduction to the interjournal special issue "Paleoclimate data synthesis and analysis of associated uncertainty" in the journals Climate of the Past (https://cp.copernicus.org/articles/special_issue11_936.html, last access: 13 December 2021) and Earth System Science Data (https://essd.copernicus.org/articles/special_issue11_936.html,last access: 13 December 2021).The purpose of this special issue is to provide a platform to present new paleoclimate synthesis products, to review the current state of proxy uncertainty analysis and to present new developments in data-model integration approaches such as proxy forward modelling and data assimilation.The special issue was conceived as part of the paleoclimate synthesis working group of the German climate modelling project PALMOD (https://www.palmod.de,last access: 13 December 2021) but with submission open to the entire paleoclimate science community.
Figure 1.Schematic workflow of paleodata-model integration to improve the understanding of the climate system in order to constrain projections and predictions of future climate.The four aspects of the data-model integration process discussed here are highlighted in red font.The process relies on thorough quantification and understanding of the uncertainty associated with both.For this reason uncertainty is at the base of the workflow (however, modelrelated uncertainty is not explicitly covered here).Further steps in this process that are covered in this special issue include data synthesis and data assimilation and proxy modelling as means to bring data and models together.
This special issue contains 23 papers covering various steps in the data-model integration process.In this introduction we place the individual contributions in the context of a paleodata-model integration concept (Fig. 1).In this concept, paleoclimate data and climate model simulations provide information, which is combined in the integration efforts such that an improved understanding of the climate as well as the nature of the paleoclimate data and the models emerges.The contributions in this issue are grouped into four important aspects of the integration process, which we briefly describe below: paleoclimate data synthesis, paleoclimate data uncertainty, proxy modelling and data assimilation.

Paleoclimate data synthesis
The starting point of the data-model integration process is formed by individual paleoclimate time series.Over the past decades thousands of paleoclimate time series based on different proxy sensors and derived from different geological archives have been generated.Fortunately, a large part of the data in these time series is publicly available.However, the data are archived in a non-standardised way, and impor-tant metadata are often hidden in the original publications.Meaningful use of paleoclimate time series thus requires laborious retrieval and some form of harmonisation of data and metadata.This process can be facilitated through the use of dedicated software, such as PaleoDataView (Langner and Mulitza, 2019), which is specifically designed for standardisation and visualisation of time series from marine sediment archives.
This special issue contains detailed descriptions of several synthesis approaches and products.Two are archive specific: the VARDA synthesis (Ramisch et al., 2020) contains detailed chronological information about annually resolved time series from laminated (varved) lakes and the PalMod 130k marine paleoclimate data synthesis contains multi-proxy time series from marine sediments across the world oceans (Jonkers et al., 2020).Other syntheses are focussed on specific variables, such as water isotope time series over the Common Era (Konecky et al., 2020), pollen records across Siberia from the past 40 000 years (Cao et al., 2020) and greenhouse gas concentrations for the past 156 000 years (Köhler et al., 2017).These contributions show how the value of the individual time series is increased through harmonisation of ontologies, metadata and chronological information.
Further contributions make use of such synthesis efforts to provide temporally or spatially continuous coverage that can be compared more easily with simulations or used as boundary conditions for climate model experiments.Thus, the pollen database (Cao et al., 2020) has been used to derive maps of land cover (Cao et al., 2019), and Paul et al. (2021) transformed previously synthesised data (MARGO Project Members, 2009) to provide a global map of sea surface temperature including uncertainty for the last glacial maximum.The gridding method used to construct the global sea surface temperature map also holds promise to extrapolate paleoclimate observations that have generally coarse spatial distribution.
Two contributions show how individual time series can be used to derive continuous regional paleoclimate time series.Building on previous work (Lisiecki and Stern, 2016), Peterson and Lisiecki (2018) provide regional stacks of benthic foraminifera stable carbon isotopes, which reflect the carbon isotope composition of dissolved inorganic carbon in oceanic bottom waters.In turn, this stable carbon isotope composition provides important constraints on the changes in the carbon cycle and ocean circulation.Fuhrmann et al. (2020) present continental aridity time series spanning 60 000 years based on multiple proxies from key regions.
All data underlying the syntheses as well as the derived data products are publicly available (see individual papers for access).Each collection has the potential to reveal spatial and/or temporal patterns that remain invisible in individual time series.By including rich metadata and chronological information, they facilitate the consideration of uncertainty in paleoclimate data-model integration.

Paleoclimate data uncertainty
For meaningful interpretation of paleoclimate data, it is crucial to be able to separate signal from noise.This requires first a quantification of the uncertainty associated with the indirect recording of climate by proxies.
Two contributions to this special issue aim to quantify the signal-to-noise ratio of proxy time series directly by assessing the correlation among nearby paleoclimate time series that should have experienced the same climate (Reschke et al., 2019;Münch and Laepple, 2018).Both studies indicate a timescale-dependent signal-to-noise ratio in paleoclimate time series.This timescale dependence of proxy uncertainty is explored further in two companion papers (Kunz et al., 2020;Dolman et al., 2021).In part one, Kunz et al. (2020) develop a theoretical framework for a spectral approach to estimate timescale-dependent uncertainty, and this concept is applied in part two (Dolman et al., 2021).The studies above focussed on geochemical proxy sensors, whereas Jonkers and Kučera (2019) present an approach to detect noise in reconstructions based on the taxonomic composition of microfossil assemblages.A final contribution to the set of papers exploring paleoclimate data uncertainty provides a first attempt to quantify the error due to uncertainty in temporal representativeness (Amrhein, 2020).The issue of the accuracy of paleoclimate observations in representing climate at specific times and during time intervals is particularly relevant for paleoclimate data-model integration of transient simulations.
Together these papers provide important new constraints on the uncertainty of paleoclimate data.They present new ways to detect and quantify various aspects of the uncertainty and thus present new tools for the paleoclimate (data) community.The spectral approach comes together with opensource code (Dolman et al., 2021) to facilitate broad use.

Proxy modelling
Because the transfer of climate signals to the geological archive and the extraction of the proxy signal involve many steps, paleoclimate data reflect climate with some degree of distortion or bias.This makes direct comparison between paleoclimate data and model output complex.However, many aspects of the incorporation of climate signals within proxies are systematic.Proxy forward models can be used to encode this recording process and can hence serve as an important bridge between paleoclimate simulations and paleoclimate observations.Two examples of proxy forward models for sedimentary archives are presented in this special issue (Dolman and Laepple, 2018;Bothe et al., 2019).Both make code available for future use of these forward models in paleoclimate data-model integration.
Whereas proxy forward modelling uses climatic input to produce hypothetical proxy records, another approach is illustrated in two articles in the special issue that present results on explicitly including the simulation of a paleoclimate proxy property in models.Cauquoin et al. (2019) present simulated estimates of the relation between climate and water isotopes, which can improve our understanding of interactions within the climate system and also of proxy systems based on water isotopes.In addition, Breil et al. (2021) show how isotope-enabled simulations can well represent the paleo-observations of isotopes for a specific region and how we can use not only global earth system models, but also regional climate models for the study of past climate.

Paleodata assimilation
Offline data assimilation is a method to fill the gaps in (spatio-temporal) coverage of paleoclimate data using physical constraints from climate models.The approach relies on a thorough understanding of the climate recording process by proxies in order to take uncertainties into account.When successfully applied, data assimilation is an effective way to derive meaningful climate state estimates from irregularly spaced observations.Different methods are illustrated in three papers in this special issue.Two focus on mid-Holocene climate in Europe.The first describes a computationally efficient approach using optimal interpolation (Fallah et al., 2018) and the second employs a Bayesian framework to integrate paleoclimate and model data (Weitzel et al., 2019).Finally, in a technical note, Bothe and Zorita (2021) apply and test the analogue method to infer climate over the past 21 000 years by using paleoclimate data time series as constraints for a pool of simulation data.

Outlook
The contributions to this special issue clearly highlight the value of paleoclimate data and of paleoclimate data science.The different synthesis products will undoubtedly prove valuable to the community, especially because they include rich metadata that are crucial for the interpretation of the data and for data-model integration.Nevertheless, the value of the paleoclimate data and the ease of keeping the syntheses up to date could be increased if paleodata archiving followed more strongly community guidelines on standardisation (Khider et al., 2019;Morrill et al., 2021).Whilst the data products in this issue present major steps forward in standardisation, the various products also show that work still needs to be done to harmonise the different syntheses in order to further increase their value (Bothe et al., 2021).
The articles also indicate further aspects that could be considered in paleoclimate data-model integration.Since the focus of this special issue was on the paleoclimate data side, most of the uncertainty assessment focussed on the uncertainties associated with indirect paleoclimate observations.However, as shown by Breil et al. (2021), a meaningful datamodel integration framework should also consider climate model uncertainty (Fig. 1).Furthermore, whilst data assimhttps://doi.org/10.This special issue provides important approaches to extend the value of paleoclimate data in providing new knowledge about climate proxies, climate models and ultimately the climate system.Collectively the contributions highlight the importance of developing a formal framework for paleoclimate data-model integration that considers uncertainty in the data and in the models.Such a framework should turn disagreement into diagnosis.After all, for a thorough understanding of the climate system, there is as much value in understanding data-model agreement as disagreement.
5194/cp-17-2577-2021 Clim.Past, 17, 2577-2581, 2021 L. Jonkers et al.: Preface: Advances in paleoclimate data synthesis and analysis of associated uncertainty ilation presents a promising example of data-model integration, validation of the resulting reconstructions and state estimates deserves additional attention, because when models and paleodata are truly integrated, other information is needed to validate the result.