Optimizing sampling strategies in high-resolution paleoclimate records

The aim of paleoclimate studies resolving climate variability from noisy proxy records can in essence be reduced to a statistical problem. The challenge is to extract meaningful information about climate variability from these records by reducing measurement uncertainty through combining measurements for proxies while retaining the temporal resolution needed to assess the timing and duration of variations in climate parameters. In this study, we explore the limits of this compromise by testing different methods for combining proxy data (smoothing, binning, and sample size optimization) on a particularly challenging paleoclimate problem: resolving seasonal variability in stable isotope records. We test and evaluate the effects of changes in the seasonal temperature and the hydrological cycle as well as changes in the accretion rate of the archive and parameters such as sampling resolution and age model uncertainty in the reliability of seasonality reconstructions based on clumped and oxygen isotope analyses in 33 real and virtual datasets. Our results show that strategic combinations of clumped isotope analyses can significantly improve the accuracy of seasonality reconstructions compared to conventional stable oxygen isotope analyses, especially in settings in which the isotopic composition of the water is poorly constrained. Smoothing data using a moving average often leads to an apparent dampening of the seasonal cycle, significantly reducing the accuracy of reconstructions. A statistical sample size optimization protocol yields more precise results than smoothing. However, the most accurate results are obtained through monthly binning of proxy data, especially in cases in which growth rate or water composition cycles obscure the seasonal temperature cycle. Our analysis of a wide range of natural situations reveals that the effect of temperature seasonality on oxygen isotope records almost invariably exceeds that of changes in water composition. Thus, in most cases, oxygen isotope records allow reliable identification of growth seasonality as a basis for age modeling in the absence of independent chronological markers in the record. These specific findings allow us to formulate general recommendations for sampling and combining data in paleoclimate research and have implications beyond the reconstruction of seasonality. We briefly discuss the implications of our results for solving common problems in paleoclimatology and stratigraphy.

Abstract. The aim of paleoclimate studies resolving climate variability from noisy proxy records can in essence be reduced to a statistical problem. The challenge is to extract meaningful information about climate variability from these records by reducing measurement uncertainty through combining measurements for proxies while retaining the temporal resolution needed to assess the timing and duration of variations in climate parameters. In this study, we explore the limits of this compromise by testing different methods for combining proxy data (smoothing, binning, and sample size optimization) on a particularly challenging paleoclimate problem: resolving seasonal variability in stable isotope records. We test and evaluate the effects of changes in the seasonal temperature and the hydrological cycle as well as changes in the accretion rate of the archive and parameters such as sampling resolution and age model uncertainty in the reliability of seasonality reconstructions based on clumped and oxygen isotope analyses in 33 real and virtual datasets. Our results show that strategic combinations of clumped isotope analyses can significantly improve the accuracy of seasonality reconstructions compared to conventional stable oxygen isotope analyses, especially in settings in which the isotopic composition of the water is poorly constrained. Smoothing data using a moving average often leads to an apparent dampening of the seasonal cycle, significantly reducing the accuracy of reconstructions. A statistical sample size optimization protocol yields more precise results than smoothing. However, the most accurate results are obtained through monthly binning of proxy data, especially in cases in which growth rate or water composition cycles obscure the seasonal temperature cycle. Our analysis of a wide range of natural situations reveals that the effect of temper-ature seasonality on oxygen isotope records almost invariably exceeds that of changes in water composition. Thus, in most cases, oxygen isotope records allow reliable identification of growth seasonality as a basis for age modeling in the absence of independent chronological markers in the record. These specific findings allow us to formulate general recommendations for sampling and combining data in paleoclimate research and have implications beyond the reconstruction of seasonality. We briefly discuss the implications of our results for solving common problems in paleoclimatology and stratigraphy.

Introduction
Improving the resolution of climate reconstructions is a key objective in paleoclimate studies because it allows climate variability to be studied on different timescales and sheds light on the continuum of climate variability (Huybers and Curry, 2006). However, the temporal resolution of climate records is limited by the accretion rate (growth or sedimentation rate) of the archive and the spatial resolution of sampling for climate reconstructions, which is a function of the sample size required for a given climate proxy. This tradeoff between sample size and sampling resolution is especially prevalent when using state-of-the-art climate proxies which require large sample sizes, such as the carbonate clumped isotope paleothermometer ( 47 ; see applications in Rodríguez-Sanz et al., 2017;Briard et al., 2020;Caldarescu et al., 2021) and stable isotope ratios in specific compounds or of rare isotopes (e.g., phosphate-oxygen isotopes in tooth apatite, triple oxygen isotopes in speleothems, or carbon iso-topes of CO 2 in ice cores; Jones et al., 1999;Schmitt et al., 2012;Sha et al., 2020). The challenge of sampling resolution persists on a wide range of timescales: from attempts to resolve geologically short-lived (thousand-year scale) climate events from deep-sea cores with low sedimentation rates (e.g., Stap et al., 2010;Rodríguez-Sanz et al., 2017) to efforts to characterize tidal or daily variability in accretionary carbonate archives (e.g., Warter and Müller, 2017;de Winter et al., 2020a). What constitutes "high resolution" is therefore largely dependent on the specifics of the climate archive.
Sample size limitations are especially important in paleoseasonality reconstructions. Reliable archives for seasonality (e.g., corals, mollusks, and speleothem records) are in high demand in the paleoclimate community because the seasonal cycle is one of the most important cycles in Earth's climate, and seasonality reconstructions complement more common long-term (thousands to millions of years) records of past climate variability (e.g., Morgan and van Ommen, 1997;Tudhope, 2001;Steuber et al., 2005;Steffensen et al., 2008;Denton et al., 2005;Huyghe et al., 2015;Vansteenberge et al., 2020). A more detailed understanding of climate dynamics at the human timescale is increasingly relevant for improving climate projections (IPCC, 2013). Unfortunately, the growth and mineralization rates of archives that capture high-resolution variability (only exceeding 10 mm yr −1 in rare exceptions; e.g., Johnson et al., 2019) limit the number and size of samples that can be obtained at high temporal resolutions (e.g., Mosley-Thompson et al., 1993;Passey and Cerling, 2002;Treble et al., 2003;Goodwin et al., 2003). In addition, accurate positioning of samples within the seasonal cycle is challenging. In the absence of fine-scale growth markings (e.g., daily laminae in mollusk shells; Schöne et al., 2005;de Winter et al., 2020a), this dating problem relies on modeling or interpolation of the growth of the archive, which introduces uncertainty in the age of samples (e.g., Goodwin et al., 2009;Judd et al., 2018). These problems are exacerbated by the fact that accurate methods for climate reconstructions may require comparatively large sample sizes or rely on uncertain assumptions. A case in point is the popular carbonate stable oxygen isotope temperature proxy (δ 18 O c ), which relies on assumptions of the water composition (δ 18 O w ) that become progressively more uncertain further back in geological history (e.g., Veizer and Prokoph, 2015). In contrast, the clumped isotope proxy ( 47 ) does not rely on this assumption but requires larger amounts of sample (e.g., Müller et al., 2017) A promising technique for circumventing sample size limitations is to analyze larger numbers of small aliquots from the same sample or from similar parts of the climate archive. These smaller aliquots typically have poor precision, but averaging multiple aliquots into one estimate while propagating the measurement uncertainty leads to a more reliable estimate of the climate variable (Dattalo, 2008;Meckler et al., 2014;Müller et al., 2017;. This approach yields improved sampling flexibility since aliquots can be combined in various ways after measurement. It also allows outlier detection at the level of individual aliquots, thereby spreading the risk of instrumental failure and providing improved control on changes in measurement conditions that may bias results. Previous studies have applied several different methods for combining data from paleoclimate records to reduce analytical noise or higher-order variability and extract variability with a specific frequency (e.g., a specific orbital cycle or seasonality; Lisiecki and Raymo, 2005;Cramer et al., 2009). These data reduction approaches can in general be categorized into smoothing techniques, in which a sliding window or range of neighboring data points is used to smooth highresolution records (see, e.g., Cramer et al., 2009), or binning techniques, in which the record is divided into equal bins in the sampling direction (e.g., time, depth, or length in the growth direction; Lisiecki and Raymo, 2005;Rodríguez-Sanz et al., 2017). In addition, a third approach is proposed here based on optimization of sample size for dynamic binning of data along the climate cycle using a moving window in the domain of the climate variable (as opposed to the sampling domain) combined with a T -test routine (see Sect. 2.1). All three approaches have advantages and caveats.
In this study, we explore the (dis)advantages of these three data reduction approaches by testing their reliability in resolving seasonal variability in sea surface temperature (SST) and water stable oxygen isotope composition (δ 18 O w ), both highly sought-after variables in paleoclimate research. We compare reconstructions of SST and δ 18 O w in real and virtual datasets from accretionary carbonate archives (e.g., shells, corals, and speleothems) using the clumped isotope thermometer ( 47 ) combined with stable oxygen isotope ratios of the carbonate (δ 18 O c ).

Reconstruction approaches
Throughout the remainder of this work, the three approaches for combining data for reconstructions are defined as follows (see also Fig. 1).
Smoothing refers to the reconstruction of SST and δ 18 O w based on moving averages of 47 and δ 18 O c records (Fig. 1b). For every dataset, the full possible range of moving window sizes (from one sample to the full length of the record) for SST and δ 18 O w reconstructions was explored. The window size that resulted in the most significant difference between maximum and minimum 47 values (based on a Student's T test) was applied to reconstruct SST and δ 18 O w from 47 and δ 18 O c records. SST and δ 18 O w were calculated for all case studies using a combination of empirical temperature relationships by Kim and O'Neil (1997; δ 18 O c -δ 18 O w -temperature relationship) and Bernasconi et al. (2018;47 -temperature relationship). To obtain δ 18 O w values, the δ 18 O c -δ 18 O w -temperature relationship (Kim and  O'Neil, 1997) was solved for δ 18 O w using the temperature reconstruction obtained from 47 measurements. Here and in other approaches, a typical analytical uncertainty in measurements of 47 (1 standard deviation of 0.04 ‰) and δ 18 O c (1 standard deviation of 0.05 ‰) was used to include uncertainty due to measurement precision. These analytical uncertainties were chosen based on typical uncertainties reported for these measurements in the literature (e.g., Schöne et al., 2005;Huyghe et al., 2015;Vansteenberge et al., 2016) and long-term precision uncertainties obtained by measuring in-house standards using the MAT253+ with Kiel IV setup in the clumped isotope laboratory at Utrecht University (e.g., Kocken et al., 2019). The measurement uncertainty was propagated through all calculations using a Monte Carlo simulation (N = 1000) in which 47 and δ 18 O c records were randomly sampled from a normal distribution with the virtual 47 and δ 18 O c values as means and analytical uncertainties as standard deviations. Resulting SST and δ 18 O w values were grouped into monthly time bins using the age model of the archive.
Binning refers to reconstructions of SST and δ 18 O w based on binning of 47 and δ 18 O c records into monthly time bins (Fig. 1c). The 47 and δ 18 O c data from each case study were grouped into monthly time bins and converted to SST and δ 18 O w using the Kim and O'Neil (1997) and Bernasconi et al. (2018) formulae. Here too, Monte Carlo simulation (N = 1000) was applied to propagate measurement uncertainties onto monthly SST and δ 18 O w reconstructions. Note that the prerequisite for this method is that the data are aligned using a (floating) age model accurate enough to allow samples to be placed in the right bin. The age of virtual samples in this study is known, so this prerequisite poses no problems in this case. However, in the fossil record this alignment might be less certain in the absence of accurate chronologies within the archive (e.g., through daily growth increments in mollusk shells; Schöne, 2008;Huyghe et al., 2019;see Sect. 4.1.3).
Optimization refers to reconstructions of SST and δ 18 O w based on sample size optimization in 47 records (Fig. 1d). In this approach aliquots of each dataset are ordered from warm (low δ 18 O c ) to cold (high δ 18 O c data) samples, regardless of their position relative to the seasonal cycle. From this ordered dataset, increasingly large samples of multiple aliquots (from two aliquots to half the length of the record) are taken from both the warm ("summer") and the cold ("winter") side of the distribution. Summer and winter samples were kept equal (symmetrical grouping) to reduce the number of possible sample size combinations and allow for more efficient computation. However, asymmetrical grouping with differing sample sizes on the summer and winter ends of the δ 18 O c spectrum is possible (see Sect. 4.1.3 and 4.2.2). Sample sizes with a significant difference in the 47 value between summer and winter groups (p ≤ 0.05 based on a Student's T test) were selected as optimal sample sizes. The moving window T test in the proxy domain ensures that an optimal compromise is reached between high precision and resolving differences between seasonal extremes. For each successful sample size, SST and δ 18 O w values were calculated from 47 and δ 18 O c data according to the Kim and O'Neil (1997) and Bernasconi et al. (2018) formulae. The relationship between SST and δ 18 O w obtained from these re-constructions was used to convert all 47 and δ 18 O c data to SST and δ 18 O w , which are then grouped into monthly SST and δ 18 O w reconstructions along the archive's age model. Measurement uncertainties were propagated through the entire approach by Monte Carlo simulation (N = 1000).
For comparison, we also include reconstructions based solely on δ 18 O c measurements with an (often inaccurate) assumption of a constant δ 18 O w (equal to the modern ocean value of 0 ‰ VSMOW), which form the most common method for carbonate-based temperature reconstructions in paleoclimate research (see, e.g., Schöne et al., 2005;Westerhold et al., 2020;Fig. 1a;hereafter: δ 18 O). For these reconstructions, δ 18 O c records were grouped into monthly time bins with analytical uncertainties propagated using the Monte Carlo approach (N = 1000) and were directly converted to SST using the Kim and O'Neil (1997) temperature relationship.
For each reconstruction, SST and δ 18 O w results were aggregated into monthly averages, medians, standard deviations, and standard errors.
Step-by-step documentation of calculations made for the three 47 -based reconstruction approaches and the δ 18 O c reconstructions is given in Supplement Data S7 and in the complementary R package (de Winter, 2021a).

Benchmarks for accuracy and precision
Accuracy and precision of reconstructions were evaluated against official USGS definitions of climate parameters (O'Donnell and Ignizio, 2012 linear and/or seasonal changes in growth rate, including growth stops (cases 2-6, 14-18); seasonal and/or multi-annual changes in δ 18 O w (cases 7-11, 13-18); multi-annual trends in SST superimposed on the seasonality (cases 12, 15, and 17); variations in the seasonal SST amplitude (cases 19-21); change in the total length of the time series (cases 22-24); variation in uncertainty on the age of each virtual data point (cases 25-29).
Comparison of the virtual time series (cases 1-29; Fig. 2) with the natural variability (cases 30-33; Fig. 3) shows that the virtual cases are not realistic approximations of natural variability in SST and δ 18 O w . Natural SST and δ 18 O w variability is not limited to the seasonal or multi-annual scale but contains a fair amount of higher-order (daily to weekly scale) variability. To simulate this natural variability, we extracted the seasonal component of SST and δ 18 O w variability from our highest-resolution record of measured natural SST and SSS data (case 30: data from Texel, the Netherlands; see Sect. 2.3.2 and Fig. 3). The standard deviation of the residual variability of these data after subtraction of the seasonal cycle was used to add random high-frequency noise to the SST and δ 18 O w variability in virtual cases. Note that while sub-annual  Texel, the Netherlands). Since direct, in situ measurements of δ 18 O w variability at a high temporal resolution were not available, δ 18 O w was estimated from more widely available SSS data using a mass balance (Eqs. 1 and 2; following, e.g., Ullmann et al., 2010).
Here, we assume salinity (SSS sample ) results from a mixture of a fraction (f ) of isotopically light and lowsalinity (δ 18 O w,freshwater = −8 ‰; SSS freshwater = 0) fresh water and a fraction (1 − f ) of ocean water (δ 18 O w,ocean = 0 ‰; SSS ocean = 35), with negative amounts of freshwater contribution (f <0) representing net evaporation (SSS sample > SSS ocean ). The value for δ 18 O w,freshwater was based on the δ 18 O w of rain in the Netherlands (−8 ‰; Mook, 1970;Bowen, 2020). Applying this mass balance to the SSS record of the Wadden Sea tidal flats (case 30) results in δ 18 O w values and an SSS-δ 18 O w relationship in agreement with measurements in this region (Harwood et al., 2008). SST and δ 18 O w time series for all cases are given in Supplement Data S4, and natural cases are plotted in Fig. 3. For all virtual proxy datasets (cases 1-33), records of SST and δ 18 O w were converted to the sampling domain (along the length of the record) by defining a virtual growth rate in the sampling direction. Adding this growth rate as a variable allowed us to test the sensitivity of approaches to changes in the extension rate of the archive, including hiatuses (growth rate 0). This is important because fluctuations in linear extension rate and periods in which no mineralization occurs (hiatuses or growth cessations) are common in all climate archives (e.g., Treble et al., 2003;Ivany, 2012). After conversion to the sampling domain, virtual aliquots were subsampled at equal distance from the SST and δ 18 O w series of all cases using six sampling intervals: 0.1, 0.2, 0.45, 0.75, 1.55, and 3.25 mm. The four largest sampling intervals were chosen such that the standard growth rate (10 mm yr −1 ) was not an integer multiple of the sampling interval (e.g., 0.45 mm instead of 0.5 mm and 3.25 mm instead of 3 mm). This decision prevents sampling of the same parts of the seasonal cycle (e.g., same months) every year, which biases both the mean value and the precision of monthly SST and δ 18 O w reconstructions. This bias towards certain parts of the seasonal cycle is much stronger at low sample sizes (large sampling intervals) and is illustrated in the Supplement Fig. S2  For all cases, graphs on top show environmental data, with SST plotted in red, δ 18 O w in blue, and growth rate (abbreviated as GR) in orange (as in Fig. 2). The graph below shows virtual δ 18 O c (blue) and 47 (red) records created from these data series using a sampling interval of 0.45 mm and including analytical noise (see Sect.

Real example
Measured (δ 18 O c ) and simulated ( 47 ) data from the Pacific oyster from the Danish List Basin yielded estimates of SST and δ 18 O w seasonality using all reconstruction approaches ( Fig. 6). While a model of shell δ 18 O c based on SST and SSS data closely approximates the measured δ 18 O c record ( Fig. 6c), basing SST reconstructions solely on δ 18 O c data without any a priori knowledge of δ 18 O w variability (assuming constant δ 18 O w equal to the global marine value) leads to high inaccuracy in mean annual SST (Fig. 6d). Note that, in the absence of significant δ 18 O w seasonality (as in this case study), seasonal temperature range reconstructions from δ 18 O c measurements can be very accurate. However, assuming constant δ 18 O w year-round may introduce considerable bias (see Figs. 7 and 8). The in-phase relationship between SST and SSS ( Fig. 6b) slightly dampens the seasonal δ 18 O c cycle, causing underestimation of temperature seasonality, while a negative mean annual δ 18 O w value in the List Basin biases SST reconstructions towards higher temperatures. In terms of SST reconstructions, the smoothing, binning, and optimization approaches based on 47 and δ 18 O c data yield more accurate reconstructions, albeit with reduced seasonality and a bias towards the summer season. The latter is a result of severely reduced growth rates in the winter season, which was therefore undersampled (see Fig. 6a and c).
Approaches including 47 data also yield far more accurate δ 18 O w estimates than the δ 18 O approach. However, the accuracy of δ 18 O w seasonality and mean annual δ 18 O w estimates is low in these approaches too, largely because of the limited sampling resolution, especially in winter. The optimization approach suffers from the strong in-phase relationship between SST and SSS, which obscures the difference between the δ 18 O w effect and the temperature effect on shell carbonate. Yet, disentangling SST from δ 18 O w seasonality is central to the success of the approach (see Sect. 4.2.3). Figure 6d does not show the precision of SST and δ 18 O w estimates, which is much lower for the smoothing approach than for the binning and optimization approaches due to the limited data in the winter seasons (see Supplement Data S6). These results show that several properties of carbonate archives, such as growth rate variability, phase relationships between SST and δ 18 O w seasonality, and sampling resolution, can impact the reliability of paleoseasonality reconstructions. The virtual and real data cases in this study were tailored to test the effects of these archive properties more thoroughly.

Case-specific results
A case-by-case breakdown of the precision (Fig. 7) and accuracy ( Fig. 8) of reconstructions using the four approaches shows that the reliability of reconstructions varies significantly between approaches and is highly case-specific. In general, precision is highest in δ 18 O reconstructions, followed by optimization and binning, with smoothing generally yielding the worst precision. Average standard deviations of the underperforming methods (binning and smoothing) are up to 2-3 times larger than those of δ 18 O (e.g., 3.9 and 3.5 • C vs. 1.3 • C for δ 18 O MAT reconstructions; see also the Supplement). It is worth noting that precision of δ 18 Obased estimates is mainly driven by measurement precision (which is better for δ 18 O c than for 47 measurements; see Sect. 4.1.1). 47 -based reconstructions lose precision due to the higher error in 47 measurements and the method used for combining measurements for seasonality reconstructions. On a case-by-case basis, the hierarchy of approaches can vary, especially if strong variability in growth rate is introduced, such as in case 14, in which the size of hiatuses in the record increases progressively, and in case 18, in which half of the year is missing due to growth hiatuses (see Table 1, Supplement Data S1 and S4). Of the 47 -based methods (smoothing, binning, and optimization), optimization is rarely outcompeted in terms of precision in both SST and δ 18 O w reconstructions. The comparison based on precision alone is misleading, as the most precise approach (δ 18 O) runs the risk of being highly inaccurate (offsets exceeding 4 • C in some MAT reconstructions; see Fig. 8a), especially in cases based on natural SST and SSS measurements (cases 30-33). The smoothing approach also often yields highly inaccurate results, especially in cases with substantial variability in δ 18 O w (e.g., 4), step 4 shows the application of the different reconstruction methods to these virtual data (see Fig. 2 for details), and step 5 illustrates how the reconstructions are compared with the original ("true") SST and δ 18 O w data to calculate the accuracy and precision of the reconstruction approaches. Note that step 1 is different for cases 1-29 (based on fully artificial SST and δ 18 O w records; Sect. 2.3.1) than for cases 30-33 (SST and δ 18 O w records based on real SST and SSS data; see Sect. 2.3.2). cases 9-11; Fig. 8). Optimization and binning outcompete the other methods in most circumstances in terms of accuracy. Binning outperforms optimization in reconstructions of δ 18 O w seasonality, making it the overall most accurate approach. Interestingly, optimization is less accurate, specifically in cases with sharp changes in growth rate in summer (e.g., cases 11, 14, 16, and 17), while binning performs better in these cases. Reconstructions of mean annual SST and δ 18 O w in case 18 are especially inaccurate regardless of which method is applied. This extreme case with growth only during half of the year combined with seasonal fluctuations in both SST and δ 18 O w presents a worst-case scenario for seasonality reconstructions, leading to strong biases in mean annual temperature reconstructions. In situations like case 18, the optimization approach is most accurate in MAT and SST seasonality reconstructions, but δ 18 O w is more accurately reconstructed using the binning approach. Finally, it is worth noting that in natural situations (Fig. 3), variability in SST almost invariably has a larger influence on δ 18 O c and 47 records than δ 18 O w such that fluctuations in δ 18 O c records closely follow the SST seasonality, even in cases with relatively large δ 18 O w variability (e.g., case 30). Chronolo-gies based on these δ 18 O c fluctuations are therefore generally accurate.

Effect of sampling resolution
As expected, increasing the temporal sampling resolution (i.e., number of samples per year) almost invariably increases the precision and accuracy (Fig. 9) of reconstructions using all methods. An exception to this rule is the precision of δ 18 O reconstructions, which decreases with increasing sampling resolution (see Fig. 9c-d). Precision standard deviations of all 47 -based approaches eventually converge with the initially much higher precision of δ 18 O reconstructions when sampling resolution increases. However, the sampling resolution required for 47 -based reconstructions to rival or outcompete the δ 18 O reconstructions differs, with optimization requiring lower sampling resolutions than the other methods (e.g., 20-40 samples per year compared to 40-80 samples per year for smoothing and binning; Fig. 9a-d). Accuracy also improves with sampling resolution (Fig. 9eh). When grouping all cases together, it becomes clear that δ 18 O reconstructions can only approach the accuracy of 47 - based approaches for reconstructions of MAT. Seasonality in both SST and δ 18 O w is most accurately reconstructed using binning, and the smoothing approach once again performs worst.

Resolving SST seasonality
Comparison of cases 19, 20, and 21 (SST seasonality of 9.7, 5.7, and 2.1 • C, respectively) with control case 1 (SST seasonality of 19.3 • C) shows how changes in the seasonal SST range affect the precision of measurements ( Fig. 10; see also Table 1 and Supplement Data S1). The data reconfirm that δ 18 O reconstructions are the most precise, which is a deceptive statistic given the risk of highly inaccurate results with this approach (see Fig. 8). Taking into consideration only analytical uncertainty, all approaches except for smoothing can confidently resolve at least the highest SST seasonality within a significance level of 2 standard deviations (∼ 95 %) using a moderate sampling resolution (mean of all resolutions shown in Fig. 10). Increasing sampling resolution improves the precision of 47 -based reconstructions (see Fig. 9d), so high sampling resolutions (0.1 or 0.2 mm) allow smaller seasonal differences to be resolved. When random sub-annual variability is added to the SST and δ 18 O w records (see Sect. 2.3.1), the minimum seasonal SST extent that can be resolved decreases for all approaches (Fig. 10b and c). Nevertheless, δ 18 O and optimization reconstructions remain able to resolve a relatively small SST seasonality of 2-4 • C.

Effect of record length
The effect of variation in the length of the record was investigated by comparing cases 22, 23, and 24 (record lengths of 6 years, 3 years, and 1 year, respectively) with the control case (record length of 12 years; see Fig. 11 and Table 1). The precision of MAT and SST seasonality reconstructions slightly increases in larger datasets (longer records) for optimization and binning, but not for smoothing and δ 18 O reconstructions. Differences between reconstruction approaches remain relatively constant regardless of the length of the record, with the precision hierarchy generally remaining intact (δ 18 O > optimization > binning > smoothing). However, in very short records (1-2 years) smoothing generally gains an advantage over other 47 -based methods due to its lack of sensitivity to changes in the record length, and binning reconstructions are not precise enough to resolve SST seasonality within 2 standard deviations (∼ 95 % confidence level). Variation in precision is largely driven by the very low precision of reconstructions in records with low sampling resolutions (sampling intervals of 1.55 mm or 3.25 mm; see also Fig. 9a-d). As a result, most of the reduction in precision in shorter records can be mitigated by denser sampling.

Effect of age model uncertainty
Uncertainty in the age model has a significant effect on both the precision and the accuracy (Fig. 12) of reconstructions using all approaches. The δ 18 O reconstructions are most strongly affected by uncertainties in the age model and suffer from a large decrease in precision with increasing age model uncertainty (Fig. 12c-d). The high precision of the δ 18 O approach in comparison with the 47 approaches quickly disappears when age model uncertainty increases beyond 20-30 d. The accuracy of δ 18 O c -based SST seasonality reconstructions initially improves with age model uncertainty (Fig. 12h). However, this observation is likely caused by the fact that age model uncertainty was compared based on conditions in case 9, which features a phase offset between SST and δ 18 O w seasonality, causing the δ 18 O method to be highly inaccurate even without age model uncertainty. The precision of smoothing and optimization approaches also decreases with increasing age model uncertainty (Fig. 12a-d), and the optimization approach loses its precision advantage over the binning and smoothing approaches when age model uncertainty increases beyond 30 d. The monthly binning approach is most resilient against increasing age model uncertainty. Seasonality reconstructions through both the binning and optimization approach quickly lose accuracy when age model uncertainty increases, but the accuracy of the smoothing approach remains the worst of all 47 -based approaches regardless of age model uncertainty, except in the case of δ 18 O w seasonality at exceptionally high (>60 d) age uncertainty ( Fig. 12e-h).      timization; see also Supplement Data S9). This is a consequence of the assumption that δ 18 O w remains constant yearround and that one knows its true value. Both these assumptions are problematic in the absence of independent evidence of the value of δ 18 O w , especially in deep time settings (see, e.g., Veizer and Prokoph, 2015;Henkes et al., 2018). The risk of this assumption is made clear when comparing cases in which δ 18 O w is indeed constant year-round at the assumed value (0 ‰; e.g., cases 1-6 and 19-24) with cases in which shifts in δ 18 O w occur, especially when these shifts are out of phase with respect to the SST seasonality (e.g., cases 9-11, 18, and 25-33; Fig. 8c-d). Cases mimicking or based on natural SST and SSS variability (cases 14-18 and 30-33) as well as the modern oyster data (Fig. 6) yield stronger inaccuracies in MAT and seasonality reconstructions, showing that even in many modern natural circumstances the assumption of constant δ 18 O w is problematic.
It is important to consider that the value of mean annual δ 18 O w remained very close to the assumed value of 0 ‰ (within 0.15 ‰) in all cases except for natural data cases 30 (−1.55 ‰), 32 (1.01 ‰; see Supplement Data S5), and the real oyster data (−1.42 ‰; Fig. 5). The SST values of these cases reconstructed using δ 18 O c data show large offsets from their actual values (+6.7 • C, −4.7 • C, and +10.3 • C for case 30, case 32, and the real oyster data, respectively; see Figs. 6 and 8 and Supplement Data S5). These offsets are equivalent to the temperature offset one might expect from inaccurately estimating δ 18 O w (∼ −4.6 • C/‰; Kim and O'Neil, 1997) and are only rivaled by the offset in MAT reconstructions of case 18 (+5.0 • C), which has growth hiatuses obscuring the coldest half of the seasonal cycle. The fact that such differences in δ 18 O w exist even in modern environments should not come as a surprise, given the available data on worldwide variability of δ 18 O w (at least −3 ‰ to +2 ‰; e.g., LeGrande and Schmidt, 2006) and SSS (30 to 40; ESA, 2020) in modern ocean basins. However, it should warrant caution in using δ 18 O c data for SST reconstructions even in modern settings. Implications for deep time reconstructions are even greater, given the uncertainty and variability in global average (let alone local) δ 18 O w values (Jaffrés et al., 2007;Veizer and Prokoph, 2015). The complications of using δ 18 O c as a proxy for marine temperatures in deep time are discussed in detail in O'Brien et al. (2017) and Tagliavento et al. (2019). Complications arising from variability in δ 18 O w are more serious in climate records from euryhaline carbonate producers (e.g., oysters) than from stenohaline organisms (e.g., corals), as they are mainly driven by salinity fluctuations. For example, seasonal salinity variability in the North Sea at offshore sites away from freshwater sources can be as low as 0.25 (Harwood et al., 2008) compared to 3-4 in the coastal Texel site simulated in case 30. Given this variability, studies using the δ 18 O c proxy for SST reconstructions are recommended to either reconstruct δ 18 O w through additional measurements (e.g., including clumped isotope analysis) or constrain δ 18 O w variability through isotope-enabled modeling (e.g., Williams et al., 2009) The analytical uncertainty of individual δ 18 O c aliquots (typically 1 SD of 0.05 ‰; e.g., de Winter et al., 2018) represents only ∼ 1.1 % of the variability in δ 18 O c over the seasonal cycle (∼ 4.3 ‰ for the default 20 • C seasonality in case 1, following Kim and O'Neil, 1997). This is much smaller than the analytical uncertainty of 47 (typically 1 SD of 0.02-0.04 ‰; e.g., de Winter et al., 2021), which equates to 25 %-50 % of the seasonal variability in 47 (∼ 0.08 ‰ for 20 • C seasonality, fol-lowing Bernasconi et al., 2018;see Supplement Data S7). This roughly 20-fold difference in relative precision causes δ 18 O c -based SST reconstructions to be much more precise (see Figs. 7,[9][10][11][12] than those based on 47 and forces the necessity of grouping 47 data in reconstructions. However, as discussed above, the high precision of δ 18 O reconstructions is a misleading statistic if they are highly inaccurate. Our results show that paleoseasonality reconstructions based on δ 18 O c can only be relied upon if there is strong independent evidence of the value of δ 18 O w and if significant sub-annual variability in δ 18 O w (>0.3 ‰, equivalent to 2-3 • C SST variability; see Figs. 9-10; Kim and O'Neil, 1997) can be excluded with confidence. Examples of such cases include fully marine environments unaffected by influxes of (isotopically light) fresh water or evaporation (increasing δ 18 O w ; Rohling, 2013). Carbonate records from environments with more stable δ 18 O w conditions include, for example, the A. islandica bivalves from considerable depth (30-50 m) in the open-marine northern Atlantic (e.g., Schöne et al., 2005, on which case 33 is based). However, even here variability in δ 18 O sw due to, for example, the shifting influence of different bottom-water masses cannot be fully excluded. Previous reconstruction studies show that δ 18 O w in smaller basins is heavily influenced by the processes affecting δ 18 O w on smaller scales, such as local evaporation and freshwater influx from nearby rivers (e.g., Surge et al., 2001;Petersen et al., 2016). Consequently, accurate quantitative reconstructions of seasonal range in shallow marine environments with extreme seasonality may not be feasible using the δ 18 O approach because these environments are invariably characterized by significant fluctuations in δ 18 O w and growth rate.
While variability in δ 18 O w compromises accurate δ 18 Obased seasonality reconstructions, the compilation in Fig. 3 shows that its influence on the δ 18 O records is too small to affect the shape of the record to such a degree that seasonality is fully obscured. While natural situations with δ 18 O w fluctuations large enough to totally counterbalance the effect of temperature seasonality on δ 18 O records are imaginable, these cases are likely rare. This means that chronologies based on δ 18 O seasonality, which are a useful tool to anchor seasonal variability in the absence of independent growth markers (e.g., Judd et al., 2018;de Winter, 2021b), are reliable in most natural cases.

Seasonality reconstructions using moving averages (smoothing)
Of the three methods for combining 47 data, the smoothing approach clearly performs worst for all four reconstructed parameters (MAT, SST seasonality, mean annual δ 18 O w and δ 18 O w seasonality) in terms of both accuracy and precision (Fig. 13). While applying a moving average may be a good strategy for lowering the uncertainty of 47 -based temperature reconstructions in a long time series (e.g., Rodríguez- Sanz et al., 2017), the method underperforms in cases in which the baseline and amplitude of a periodic component (e.g., MAT and SST seasonality) are extracted from a record. This is likely due to the smoothing effect of the moving average, which reduces the seasonal cycle and causes highly inaccurate seasonality reconstructions (offsets amounting to >6 • C; Fig. 13). This bias is especially detrimental in cases in which the seasonal cycle is obscured by seasonal growth halts (e.g., case 18), multi-annual trends in growth (e.g., cases 4, 14, and 17), and multi-annual trends in SST (e.g., cases 15 and 17; see Figs. 7 and 8). The poor performance of the smoothing approach can be slightly mitigated by increasing sampling resolution (Fig. 9), but even at high sampling resolutions (every 0.1 or 0.2 mm) the method still fails to reliably resolve seasonal SST ranges below 5 • C, even in idealized cases (cases 19-21; Fig. 10). Increasing the number of samples by analyzing longer records does not improve the result because smoothing of the seasonal cycle by a moving average window introduces the same dampening bias if the temporal sampling resolution (number of samples per year) remains equal (Fig. 11). More critically, employing the smoothing method may give the illusion that seasonality is more reduced and severely bias reconstructions. This bias highlights the importance of using the official meteorological definition of seasonality as the difference between the averages of the warmest and coldest month in paleoseasonality work (O'Donnell and Ignizio, 2012). This definition is much more robust than the "annual range" often cited based on maxima and minima in δ 18 O c records. This annual range strongly depends on sampling resolution, which is typically <12 samples per year (Goodwin et al., 2003), equivalent to the third lowest sampling interval (0.75 mm) simulated in this study. Therefore, we strongly recommend that future studies adhere to the monthly definition of seasonality to foster comparison between studies. While interannual variability is lost by combining data from multiple years into monthly averages, this approach increases the precision, accuracy, and comparability of paleoseasonality results. Interannual variability can still be discussed from raw data plotted in the time or sampling domain.

Monthly binning, sample size optimization, and age model uncertainty
Overall, the most reliable paleoseasonality reconstructions can be obtained from either binning or optimization (Fig. 13).
In general, optimization is slightly more precise, while binning yields more accurate estimates of the seasonal range in SST and δ 18 O w (Fig. 13b and d). The more flexible combination of aliquots in the optimization routine yields improved precision (especially for mean annual averages) in cases in which parts of the record are undersampled or affected by hiatuses and simultaneous fluctuations in both SST and δ 18 O w (e.g., cases 3-6, 14-18, 30-33). The downside of this flexibility is that in the case of larger sample sizes, the seasonal variability may be dampened, like in the smoothing approach (see Sect. 4.1.2). This apparent dampening effect may be reduced by allowing the sample size of summer and winter samples to vary independently in the optimization routine at the cost of higher computational intensity due to the larger number of sample size combinations (see Sects. 2.1 and 4.2.2). The rigid grouping of data into monthly bins in binning prevents this dampening and therefore yields slightly more accurate estimates of seasonal ranges in SST and δ 18 O w . A caveat of binning is that it requires a very reliable age model of the record, at least on a monthly scale. If the age model has a large uncertainty, there is a risk that samples are grouped in the wrong month, which compromises the accuracy of binning reconstructions, especially for reconstructions of seasonal range (Fig. 12h). This problem is exacerbated by potential phase shifts between seasonality in paleoclimate variables (SST and δ 18 O w ) and calendar dates, which may occur in the presence of a reliable age model. Previous authors attempted to circumvent the dating problem by analyzing high-resolution δ 18 O c transects and subsequently sampling the seasonal extremes for clumped isotope analyses (Keating-Bitonti et al., 2011;Briard et al., 2020). While this approach does not require sub-annual age models, it has several disadvantages compared with binning and optimization approaches: firstly, it requires separate sampling for δ 18 O c and 47 , which may not be possible in high-resolution carbonate archives due to sample size limitations. Analyzing small aliquots for combined δ 18 O c and 47 analyses consumes less material. Secondly, individual summer and winter temperature reconstructions require large (>1.5 mg; e.g., Fernandez et al., 2017) 47 samples from seasonal extremes, which causes more time averaging than the approaches combining small aliquots. Finally, the position of seasonal extremes estimated from the δ 18 O c record may not reflect the true seasonal extent if seasonal SST and δ 18 O w cycles are not in phase (as in case 9), causing seasonal 47 -based SST reconstructions to underestimate the temperature seasonality. In such cases, δ 18 O c and 47 analyses of small aliquots allow the seasonality in SST and δ 18 O w to be disentangled, yielding more accurate seasonality reconstructions.
Techniques for establishing independent age models for climate archives range from counting of growth layers or increments (Schöne, 2008;Huyghe et al., 2019), to modeling and extracting of rhythmic variability in climate proxies through statistical approaches (e.g., De Ridder et al., 2007;Goodwin et al., 2009;Judd et al., 2018;de Winter, 2021b), and interpolation of uncertainty on absolute dates (e.g., Scholz and Hoffman, 2011;Meyers, 2019;Sinnesael et al., 2019). While propagating uncertainty in the data on which age models are based onto the age model is relatively straightforward, errors in underlying a priori assumptions, such as linear growth rate between dated intervals, (quasi-)sinusoidal forcing of climate cycles, and the uncertainty in human-generated data like layer counting, are very difficult to quantify (e.g., Comboul et al., 2014) and may not be normally distributed. Results of cases 25-29 show that uncertainties in the age domain can significantly compromise reconstructions (Fig. 12). Within the scope of this study, only the effect of symmetrical, normally distributed uncertainties in an artificial case with phase-decoupled SST and δ 18 O w seasonality (case 9) was tested. The effects of other types of uncertainties on the reconstructions remain unknown, highlighting an unknown uncertainty in paleoseasonality and other high-resolution paleoclimate studies that may introduce bias or lead to overly optimistic uncertainties in reconstructions. Future research could quantify this unknown uncertainty by propagating estimates of various types of uncertainty on depth values of samples and on the conversion from sampling to time domain in age models.

Conditions influencing success of reconstructions
The reliability (accuracy and precision) of SST and δ 18 O w reconstructions depends on case-specific conditions. The range of cases tested in this study allowed us to evaluate the effect of variability in SST, growth rate, δ 18 O w , sampling resolution, and record length relative to the control case (case 1; see Supplement Data S1). A summary of the effects of these changes is given in Table 2.

SST variability
Variability in water temperature most directly affects the proxies under study. By default (case 1), SST varies sinusoidally around a MAT of 20 • C with an amplitude of 10 • C (see Sect. 2.3.1, Fig. 2, and Supplement Data S1). In cases in which multi-annual variability in SST is simulated (e.g., cases 15 and 17), the accuracy of SST reconstructions using δ 18 O and optimization is reduced, while the binning approach is less strongly affected. Examples of such multi-annual cyclicity include the El Niño-Southern Oscil- Table 2. Qualitative summary of the effects of changes in variables relative to the ideal case on reconstructions using the four approaches. The "cases" column lists cases in which the changes in the respective variable relative to the control case (case 1) were represented (see Tables 1 and S1). 0: negligible effect, +: weak increase in uncertainty, ++: moderate increase in uncertainty, + + +: strong increase in uncertainty. Precision and accuracy of all tests are given in S9.

Variable
Cases  Philander, 1983) and North Atlantic Oscillation (NOA; Hurrell, 1995). The effect is especially large in case 17, which simulates a tropical environment with reduced SST seasonality and a strong multi-annual cyclicity. This type of environment is analogous to the environment of tropical shallow-water corals, which are often used as archives for ENSO variability (e.g., Charles et al., 1997;Fairbanks et al., 1997), and is similar to tropical cases from the Australian Great Barrier Reef (case 31) and the Red Sea (case 32; see Fig. 3). We therefore recommend using the binning approach on carbonate records in which multi-annual cyclicity is prevalent and if a reliable age model can be established for these records (as in, e.g., Sato, 1999;Scourse et al., 2006;Miyaji et al., 2010).

Growth rate variability and hiatuses
Figures 7 and 8 show that variations in the growth rate of records, including the occurrence of hiatuses, have a strong effect on reconstructions, especially using the smoothing approach. In general, hiatuses and slower growth reduce the precision of monthly SST and δ 18 O w reconstructions by reducing mean temporal sampling resolution (samples per year; see Fig. 9) and because parts of the record are undersampled. The effect on accuracy depends strongly on the timing of changes in growth rate or the occurrence of hiatuses. Cases 2-6 simulate specific growth rate effects and can be used to test these differences. The smoothing method is especially sensitive to changes in growth rate that take place in specific seasons, such as hiatuses in winter (case 2) or summer (case 3) and growth peaks in summer (case 5) or spring (case 6). The other reconstruction approaches are less affected by this bias because they generally do not mix samples from different seasons. The δ 18 O method is especially well suited to deal with changes in growth rate because it does not require combining different aliquots for accurate SST reconstructions. The binning and optimization approaches are slightly less reliable in cases in which the growth rate decreases linearly or seasonally along the entire record (cases 4-6; Fig. 2). Because these two methods consider all samples in the records at once, they are more sensitive to changes in temporal sampling resolution along the record. It is worth noting that optimization is especially sensitive to sharp changes in growth rate in summer (e.g., cases 11, 14, 16, and 17) because those conditions force the optimization routine to use larger sample sizes or include samples outside the warmest month for summer temperature estimates. A potential solution to this problem could be to allow sample sizes of summer and winter groups to vary independently in the optimization routine (see Sect. 2.1). This would allow sample size in the undersampled season (in this case, summer) to become larger than that at the other end of the δ 18 O c spectrum, reducing uncertainty in the more densely sampled season and therefore improving the entire seasonality reconstruction. A worst-case scenario is represented by case 18, in which the cold half of the year is not recorded. Such cases result in strong biases in reconstructions of mean annual and seasonal ranges in SST and δ 18 O w , regardless of which method is used. In such extreme cases the record simply contains insufficient information to reconstruct variability in growth rate, SST, and δ 18 O w , and it seems that no statistical method would enable this missing information to be recovered. The solution for these reconstructions would be to establish reliable age models, independent of δ 18 O or 47 data, which show that a large part of the seasonal cycle is missing. All methods used in this study rely on a conversion of SST and δ 18 O w reconstructions to the time domain to define monthly time bins. This conversion breaks down in fossil examples when the seasonal cycle cannot be extracted from the archive, which happens when half of the seasonal cycle or more is obscured by growth hiatuses, as exemplified in case 18.
While hiatuses encompassing half of the seasonal cycle are uncommon, changes in growth rate are common in accretionary carbonate archives because conditions for (biotic or abiotic) carbonate mineralization often vary over time. This variability is either driven by biological constraints, such as senescence (e.g., Schöne, 2008;Hendriks et al., 2012), the reproductive cycle (Gaspar et al., 1999), and stress (Surge et al., 2001;Compton et al., 2007), or by variations in the environment that promote or inhibit carbonate production, such as seasonal variations in temperature (Crossland, 1984;Bahr et al., 2017) or precipitation (Dayem et al., 2010;Van Rampelbergh et al., 2014). In general, such conditions occur more frequently in mid-to high-latitude environments than in low latitudes and in more coastal environments rather than in open-marine settings because these environments contain stronger variations in the factors that influence growth rates (e.g., temperature, precipitation, and freshwater influx; Surge et al., 2001;Ullmann et al., 2010). This difference was simulated in the cases representing natural variability (cases 14-18 and 30-33). Accuracy in the coastal high-latitude settings (cases 16, 18, and 29) is indeed more strongly affected by changes in growth rate. Because in such highly variable environments growth rate variability often co-occurs with variability in δ 18 O w , using δ 18 O c -based reconstructions is not advised unless δ 18 O w variability can be constrained or neglected (which is rare in these environments).
Additional complications include the lack of a constraint on growth rate variability because of uncertainties in the record's age model (see Sect. 4.1.3) and the effect of growth rate variability on the sampling resolution. The effect of growth rate on time averaging within samples was not specifically tested in this study but introduces uncertainty in practice when archives with a variable growth rate are sampled at a constant sampling resolution in the depth domain. In this case, parts of the archive with a lower growth rate yield more time-averaged samples, potentially dampening one extreme of the seasonal cycle (e.g., Goodwin et al., 2003). In highly dynamic environments it is challenging to isolate all variables that introduce bias, and irregular variability in growth rate and δ 18 O w will invariably introduce uncertainty in SST reconstructions, even when applying the best 47 -based approaches (e.g., binning and optimization). In such examples, the results of natural variability cases (14-18 and 30-33) and of the real oyster data (Fig. 6) serve as benchmarks for the degree of uncertainty that may remain unexplained in these records.

Variability in δ 18 O w
As discussed in Sect. 4.1.1, these variations in δ 18 O w have a large effect on the accuracy of δ 18 O c -based reconstructions, and their occurrence constitutes the main advantage of applying the 47 thermometer (Eiler, 2011). However, results of cases 7-11 in Fig. 8 and Table 2 show that δ 18 O w variations can also bias 47 -based reconstructions, especially those of seasonal ranges and those using the smoothing approach. Smoothing reconstructions are biased by these δ 18 O w shifts in much the same way as they are affected by shifts in growth rate (see Sect. 4.2.2). The optimization approach is sensitive to seasonal changes in δ 18 O w in antiphase with SST seasonality and increases in δ 18 O w in summer (e.g., due to excess evaporation as in case 11), especially when used for reconstructions of δ 18 O w seasonality. This effect arises because the optimization approach orders data based on δ 18 O c and 47 seasonality to isolate the δ 18 O w -SST relationship. Both antiphase δ 18 O w seasonality and summer evaporation dampen the seasonal δ 18 O c cycle and therefore influence the reconstruction of the δ 18 O w -SST relationship. A good example of this is seen in the real oyster data (Fig. 6), in which δ 18 O w and SST vary in phase and δ 18 O w dampens the SST seasonality. The binning approach is more robust against δ 18 O w variability that dampens the seasonal cycle, and it is therefore a better choice for absolute SST reconstructions in environments where summer evaporation or other δ 18 O w variability in phase with SST seasonality is expected to occur if the age model is reliable enough to allow monthly binning of raw data (see Sect. 4.1.3). Indeed, reconstructions from the lagoonal environment (case 16) and Red Sea case (case 32, which is characterized by strong summer evaporation; e.g., Titschack et al., 2010) show that binning is the most reliable choice in these environments.

Variability in sampling resolution and record length
Other factors influencing the effectiveness of reconstructions are the sampling resolution and the length of the record. Many of the cases discussed in this study represent idealized cases with comparatively high sampling resolutions over comparatively long (12 years) paleoseasonality records, which yield large sample sizes. By comparison, the typical age of mollusks, which are often used as paleoseason-ality archives, is 2-5 years (Ivany, 2012). Records with the highest sampling resolutions (0.1 and 0.2 mm) contain up to 1200 samples. Generating such records is not impossible, but it is highly unlikely to be applied in paleoclimate studies given the limitation of resources (e.g., instrument time) and the desire to analyze multiple records from different specimens, species, localities, or ages to gain a better understanding of the variability in paleoseasonality (e.g., Goodwin et al., 2003;Schöne et al., 2006;Petersen et al., 2016). In some cases large datasets are meticulously collected from single carbonate records (e.g., Schöne et al., 2005;Vansteenberge et al., 2016;de Winter et al., 2020a;Shao et al., 2020). However, in such studies, the aim is often to investigate variability at a higher (e.g., daily; de Winter et al., 2020a) resolution or longer timescales (e.g., decadal to millennial; Schöne et al., 2005;Vansteenberge et al., 2016;Shao et al., 2020) in addition to the seasonal cycle rather than to improve the reliability of reconstructing one type of variability (e.g., seasonality) alone. Figure 9 shows that increasing temporal sampling resolution (samples per year) improves both the accuracy and precision of all 47 -based reconstructions. This occurs because 47 samples have large analytical uncertainty (see Sect. 4.1.2), and grouping data therefore improves reconstructions. The decrease in precision of δ 18 O c -based reconstructions ( Fig. 9c-d) is explained by the fact that the analytical uncertainty of δ 18 O c measurements is much smaller than the variability introduced by natural sub-annual variability in SST and δ 18 O w unrelated to the seasonal cycle (see Supplement Data S4). Therefore, higher sampling resolutions allow δ 18 O c records to better capture this sub-seasonal variability, which introduces more noise to the seasonal cycle (reducing precision) but causes monthly mean SST and δ 18 O w to be more accurately reconstructed. Towards higher sampling resolutions, the gap in precision between δ 18 O c -and 47 -based reconstructions closes, eventually (in an ideal case) diminishing the advantage of high analytical precision in δ 18 O c measurements ( Fig. 9c-d).
An optimum sample resolution can be defined for each method, above which improving sampling resolution does not significantly improve the reliability of the reconstruction (as in . Figure 9 shows that this optimum varies depending on which variable (MAT, SST seasonality, mean annual δ 18 O w , or δ 18 O w seasonality) is reconstructed. Therefore, Fig. 9 will allow future researchers to determine the sampling resolution that is tailored to their purpose. In general, the improvement after a sample size of 20-30 samples per year is negligible for the binning and optimization methods if the total number of samples (depending on both sampling resolution and record length) is sufficient for monthly temperature reconstructions. Our data show that 200-250 paired δ 18 O c and 47 measurements are in general sufficient for a standard deviation of 2-3 • C in monthly SST reconstructions using the binning or optimization approach, preferably when spread over multiple growth years to elim- Figure 14. Schematic guide to choosing the right approach for reconstructing annual mean or seasonality in SST and δ 18 O w from accretionary carbonate archives. Recommendations are based on the results of testing all four approaches on the entire range of cases. Researchers can follow the six steps (questions Q1-6) to decide on the right approach for reconstructing the target variable. Guidelines are based on maximizing both accuracy and precision (see details in Supplement Data S9). Note that the smoothing approach is never the best choice. The choice between the two remaining 47 -based approaches (binning and optimization) relies heavily on the situation and may be driven by a preference for more accurate or more precise results.
inate the effect of short-term weather events or years with exceptional seasonality (Fig. 10; Supplement Data S5).
Record length only has a minimal influence on the optimization method, but for very short records (≤ 2 years) binning becomes very imprecise, especially at low sampling resolutions (Fig. 11). The reason is that the sample size within monthly time bins becomes too small in these cases, while the more flexible sample size window of the optimization routine circumvents this problem. The choice between these two approaches should therefore be based on a trade-off between the length of the record (in time) and the number of samples that can be retrieved from it. As a result, shorterlived, fast-growing climate archives, such as large or fastgrowing (e.g., juvenile) mollusk shells, are best sampled using a high-temporal-resolution (>30 samples per year) sampling strategy with the optimization approach. Longer-lived archives with a lower mineralization rate, such as annually laminated speleothems, corals, and gerontic mollusks, are best sampled using long time series at monthly resolution using the binning approach.
A simplified decision tree that could guide sampling strategies for future paleoseasonality studies is shown in Fig. 14. Note that choices and trade-offs for these reconstructions may differ depending on the archive and environment in which it formed (see discussion above).

Implications for clumped isotope sample size
The optimization technique for grouping 47 aliquots for accurate SST and δ 18 O w reconstructions allows us to assess the limitations of the clumped isotope thermometer for temperature reconstructions from high-resolution carbonate archives. The optimal sample size given by the approach is different for different cases and depends on the temporal sampling resolution and the characteristics of the record (see Supplement Data S4). As expected, in cases more like the ideal case (case 1), optimal sample sizes are low (∼ 14-24), while sample sizes increase in more complicated cases based on simulated natural environments (cases 14-18) or cases based on actual SST and SSS data (cases 30-33). More confined SST seasonality (cases 19-21) also requires larger samples to reconstruct (up to 100 samples in some cases). This is not surprising because variability within samples will increase in records in which the seasonality is smaller or more obscured by other environmental variability. The optimal sample size between cases and sampling resolutions is not normally distributed but tails towards high sample sizes with some extreme outliers (Shapiro-Wilk test p 0.05; Supplement Data S10). The median sample size of all our simulations is 17 aliquots. This number lies between the minimum number of 14 ∼ 100 µg replicates of standards calculated by  and the minimum of 20-40 ∼ 100 µg aliquots required for optimal paleoseasonality reconstruction from fossil bivalves by de Winter et al. (2021). This is to be expected since many of the cases explored in this study represent ideal cases compared with the natural situation. However, in these virtual cases a measure of random sub-annual variability in SST and δ 18 O w was added (see Fig. 4 and Supplement Data S2), simulating a more realistic environment and resulting in poorer precision than replicates of a carbonate standard . Our simulations show that the optimum number of samples to be combined in seasonality studies depends on both the analytical uncertainty of 47 measurements (as represented by the estimate in  and the variability between aliquots pooled within a sample that is attributed to actual variability within the record (as represented by our simulations and the estimate in de Winter et al., 2021). The optimal sample size is therefore a good measure for the limitations of temperature variability that can be resolved in a record and can help researchers decide which strategy to apply for combining measurements to obtain the most reliable paleoseasonality estimates or to decide whether extra sampling is required, even if the chosen approach is not to use the optimization routine itself. Note that the optimum sample size is kept equal for summer and winter samples in this study and that the optimization approach can likely achieve better performance by considering unequal sample sizes in opposite seasons (see Sect. 4.1.3 and 4.2.2). While this added flexibility comes at a higher computational cost due to the increased number of possible sample size combinations to be considered, future studies should investigate whether this updated optimization approach could yield more reliable seasonality reconstructions.

Implications for other sample size problems
While the discussion above focuses on optimizing approaches for combining samples for clumped isotope analyses in paleoseasonality reconstructions, the problem of combining samples to reduce uncertainty and isolate variation in datasets is very common (e.g., Zhang et al., 2004;Merz and Thieken, 2005;Tsukakoshi, 2011). Therefore, the approaches outlined and tested in this study have applications beyond paleoseasonality reconstructions. Examples of other problems that could benefit from applying similar approaches for reducing the uncertainty of estimates of target variables while minimizing the number of analyses required to meet analytical requirements include (1) reconstructing paleoenvironmental variability in the terrestrial realm from tooth bioapatite (e.g., Passey and Cerling, 2002;Kohn, 2004;Van Dam and Reichart, 2009;de Winter et al., 2016), (2) quantitative time series analysis of orbital cycles in stratigraphic records (e.g., Lourens et al., 2010;de Vleeschouwer et al., 2017;Noorbergen et al., 2018;Westerhold et al., 2020), (3) strontium isotope dating (e.g., McArthur et al., 2012;de Winter et al., 2020b), (4) reconstructing sub-seasonal variability from ultrahigh-resolution records (e.g., from fastgrowing mollusks and gastropods; Sano et al., 2012;Warter and Müller, 2017;de Winter et al., 2020a;Yan et al., 2020), and (5) reconstructing sea surface and deep-sea temperatures across short-lived (10-100 kyr) episodes of climate change or climate shifts from deep marine archives characterized by low sedimentation rates (e.g., Lear et al., 2008;Jenkyns, 2010;Stap et al., 2010;Lauretano et al., 2018). A more detailed discussion of the implications for other sample size problems is provided in the Supplement.

Conclusions and recommendations
The performance of three 47 -based approaches to reconstruct seasonality from accretionary carbonate archives was evaluated in comparison with conventional δ 18 O c -based reconstructions in a wide range of case studies. From the results, we conclude that while δ 18 O c -based reconstructions (δ 18 O) yield superior precision for SST reconstructions, this method runs a high risk of yielding inaccurate results due to innate assumptions about the value of δ 18 O w , which must be estimated and assumed constant year-round. Unless δ 18 O w can be independently constrained or variability in δ 18 O w can be neglected, 47 -based reconstructions should be the method of choice for absolute mean annual temperature and SST seasonality reconstructions. Various techniques for combining 47 data were evaluated. Our findings suggest that smoothing 47 data using a moving average almost always results in a dampening of the seasonal cycle, which severely hampers recovery of seasonality. Applying the smoothing approach results in inaccuracies in reconstructions of MAT as well, especially in cases in which part of the seasonal cycle is obscured by variability in growth rate or multi-annual trends. More reliable seasonality reconstructions are achieved with two approaches for combining 47 data using time binning (binning) or applying a flexible sample size optimization (optimization) approach. Of these two approaches, optimization achieves better precision and can resolve smaller seasonal temperature differences with confidence. However, binning is often more accurate and outperforms optimization as the most reliable approach. This is especially true in cases with growth stops or δ 18 O w changes in phase with temperature seasonality (e.g., strong seasonal evaporation or freshwater influx) and in longer multi-annual time series with a reliable age model. Optimization is the better choice for shorter (<3 years) records, especially if the sampling resolution can be increased, such as in short, fastgrowing climate archives.
Despite the focus on the problem of resolving seasonality in carbonate archives, the findings in this study have applications for other problems in Earth science for which sample size and sampling resolution put limits on the ability to resolve specific trends, events, and cycles from time series. While the abovementioned recommendations of the optimization and binning methods are likely valid for most studies aiming to quantify the mean and amplitude of a specific cycle or event (equivalent to MAT and SST seasonality), (dynamic) moving averages (smoothing) are expected to yield the best results in studies quantifying aperiodic trends from longer data series.
Code availability. All scripts used to make the calculations described in this study are compiled in the documented R package "seasonalclumped", which is freely available on the open-source online R database CRAN (de Winter, 2021a; https://cran.r-project. org/web/packages/seasonalclumped). Annotated R scripts used to make calculations for this study are available in the digital supplement uploaded to the open-source online repository Zenodo (http: //www.doi.org/10.5281/zenodo.3899926; de Winter et al., 2020d).
Data availability. Supplement data, figures, and tables, as well as all scripts used to do the calculations and create the virtual datasets used in this study, are deposited in the open-source online repository Zenodo (http://www.doi.org/10.5281/zenodo.3899926; de Winter et al., 2020d). Virtual datasets generated within the context of this study are also made available as data files within the R package that contains the scripts used for this study ("seasonalclumped"; de Winter, 2021a; see https://cran.r-project.org/web/ packages/seasonalclumped).
Author contributions. NJdW designed the study, wrote the scripts for all calculations, and created a first draft of the paper text and figures. MZ, TA, and NJdW worked together from the first draft towards the final paper. All authors contributed to the representation of the data and methods in figures and to the discussion of the implications of the data.
Competing interests. The authors have no potential conflicts of interest to declare with regards to this study.
Acknowledgements. The authors would like to thank editor Alberto Reyes for his helpful suggestions for improving the paper and for moderating the review process. Thanks to Andrew Johnson and two anonymous reviewers for their comments, which helped improve the paper. All members of the Clumped Isotope research group at Utrecht University, most notably Ilja Kocken and Inigo Müller, are acknowledged for their comments and recommendations on a presentation of the initial results of this study.
Financial support. This research has been supported by the European Commission, Horizon 2020 (UNBIAS (grant no. 843011)).
Review statement. This paper was edited by Alberto Reyes and reviewed by Andrew Johnson and two anonymous referees.