Articles | Volume 16, issue 5
Clim. Past, 16, 1807–1820, 2020
Clim. Past, 16, 1807–1820, 2020

Research article 28 Sep 2020

Research article | 28 Sep 2020

Does a proxy measure up? A framework to assess and convey proxy reliability

Does a proxy measure up? A framework to assess and convey proxy reliability
F. Garrett Boudinot1 and Joseph Wilson2 F. Garrett Boudinot and Joseph Wilson
  • 1Department of Geological Sciences and Institute of Arctic and Alpine Research, University of Colorado Boulder, Boulder, Colorado 80309, USA
  • 2Department of Philosophy and Center for the Study of Origins, University of Colorado Boulder, Boulder, Colorado 80309, USA

Correspondence: F. Garrett Boudinot (


Earth scientists describe a wide range of observational measurements as “proxy measurements”. By referring to such a vast body of measurements simply as “proxy”, researchers dilute significant differences in the various ways that measurements relate to the phenomena they intend to describe. The limited language around these measurements makes it difficult for the nonspecialist to assess the reliability and uncertainty of data generated from proxy measurements. Producers and reviewers of proxy data need a common framework for conveying proxy measurement methodology, uncertainty, and applicability for a given study.

We develop a functional distinction between different forms of measurement based on the different ways that their outputs (values, interpretations) relate to the phenomena they intend to describe (e.g., temperature). Paleotemperature measurements, which are used to estimate temperatures of systems in Earth's past, serve as a case study to examine and apply this new functional proxy definition. We explore the historical development and application of two widely used paleotemperature proxies, calcite δ18O and TEX86, to illustrate how different measurements relate to the phenomena they intend to describe. Both proxies are vulnerable to causal factors that interfere with their relationship with temperature but address those “confounding causal factors” in different ways. While the goal of proxy development is to fully identify, quantify, and calibrate to all confounding causal factors, the reality of proxy applications, especially for past systems, engenders unavoidable and potentially significant uncertainties. We propose a framework that allows researchers to be explicit about the limitations of their proxies and identify steps for further development. This paper underscores the ongoing effort and continued need for critical examination of proxies throughout their development and application, particularly in Earth's history, for reliable proxy interpretation.

1 Introduction

Proxy measurements can provide information about otherwise elusive properties of systems in Earth's past, present, and worlds beyond. With a growing interest in quantitatively measuring these properties more precisely and in new environments, the diversity of proxies has increased dramatically. While “proxy” is often used to differentiate “indirect” (e.g., geochemical, physical) measurements from more “direct” forms of observational measurement, neither of those terms provide insight into the reliability or applicability of different measurements. Even direct forms of measurement can be considered proxies in this sense; all involve some level of observational “indirectness”. Earth scientists are particularly aware of the nuances of measurement applicability – as researchers look farther back in time, the reliability of a measurement (i.e., our understanding of what that measurement represents) typically becomes less certain. A standardized framework for conveying how proxy measurements relate to different systems and phenomena would be widely useful for describing these complex associations to nonspecialists, students, modelers, and other proxy users.

The goal of this paper is to describe how methods of observational measurements differ in the ways their outputs (values, data, interpretations) relate to the phenomena they intend to describe. All forms of observational measurement are influenced by factors that are not the property being measured. We provide insight into the assumptions behind the interpretation and development of different forms of measurement, with the goal of more clearly describing those assumptions and uncertainties in the context of data interpretations.

We examine paleotemperature measurements, which are used to estimate temperatures of systems in Earth's past, as a case study given the growing interest in paleoclimate, the diversity of measurements available, and the field's relationship to unknown changes in the Earth–climate system through time. We propose a theoretical framework and language that can more accurately distinguish different measurement–property relationships, which we hope will lead to more robust measurement calibrations, more transparent measurement outputs, and stronger interpretations. While paleoclimate is the focus below, the ideas described here apply to observational measurements across many fields of science.

2 Functional distinctions for proxy measurements

The placement of measurements in two overarching groups, proxy and direct, is particularly common in climate sciences (NOAA National Centers for Environmental Information, 2020; Jansen et al., 2007). Recent philosophical work points out the need for clarification behind the definition of proxy measurements as indirect and non-proxy measurements as direct and questioned how proxies can provide reliable measurements in spite of such perceived indirectness. While many have referred to oxygen isotopes in calcite (δ18Ocalcite) as a proxy for temperature and the mercury thermometer as a direct measurement of temperature (NOAA National Centers for Environmental Information, 2020; Jansen et al., 2007), both scientists and philosophers of science have pointed out that neither measurement technique truly represents direct observation (e.g., Ruddiman, 2008; Wilson and Boudinot, 2020). The mercury thermometer measures temperature via the observable thermal expansion of mercury as a function of temperature, while δ18Ocalcite measures paleotemperature via observable variation of 18O incorporation into calcite (CaCO3) as a function of temperature resulting from the differences in vibrational energies of different oxygen isotopes (i.e., 16O , 17O, 18O). In other words, neither produces a direct measurement of temperature; both rely on the observation of some effect of temperature in a system.

Each of these measurements is also influenced by other non-temperature causal factors. Mercury expansion is not only a function of temperature, but also of the partial pressure of the atmosphere and expansion dynamics of liquid mercury. Similarly, δ18Ocalcite is influenced by the δ18O of the surrounding water (δ18OH2O; Urey, 1948), the pH of the surrounding water (Spero et al., 1997), and, if biomineralized by calcifying organisms, biological kinetic effects on 18O incorporation (Bemis et al., 1998; Ravelo and Hillaire-Marcel, 2007). Given philosophical arguments attuned to the conceptual and epistemic issues regarding different forms of scientific measurement (e.g., Suppes, 1951; Franklin, 1990; Chang, 2004; Van Fraassen, 2010; Wilson and Boudinot, 2020), we propose that proxies differ from other forms of measurement in how they account for these confounding causal factors (CCFs; see the “Glossary of terms”; Wilson and Boudinot, 2020).

Under this definition, non-proxy measurements are those that have been designed and manufactured to eliminate all of the potential effects of known CCFs on the measurement output. Because these non-proxy measures control which parts of the system contribute to the final measurement outputs, we refer to them as controlled measurements (see the “Glossary of terms”). Mercury thermometers, for example, are manufactured with a glass casing that controls the atmospheric pressure within the thermometer. The glass case eliminates variation in non-temperature CCFs (e.g., changes in atmospheric pressure, potential for fluid exchange) such that the measured signal can only represent the phenomenon in question, temperature. The lines on the thermometer are calibrated to the thermodynamic properties of mercury such that a specific volumetric expansion of mercury is a causal result of the specific local temperature. In this way, the mercury thermometer is used to perform a controlled measurement.

While the process is more sophisticated, the digital thermometers more commonly used today also control all known CCFs within the instrument to provide a single calibrated temperature value. For those digital thermometers that use electrical resistance, for example, the built-in computer immediately converts an electrical resistance reading to temperature and is calibrated to effectively remove the influence of non-temperature effects on such resistance, including the composition, length, and width of the metal probe used in the thermometer. Because digital thermometers account for all CCFs that influence the relationship between electrical resistance and temperature in real time, digital thermometers, too, are used to perform controlled measurements.

Proxy measurements are distinct because their process of measurement does not rule out all CCFs (see the “Glossary of terms”). This means that the original signal from the analytical measurement must be subject to further manipulation, such as incorporation into a calibration. Those calibrations are based on the field's best understanding of the drivers of that measured property and quantitatively attempt to minimize the influence of CCFs to produce a value that represents the phenomena in question (Fig. 1). For example, δ18Ocalcite is a proxy measurement because δ18Ocalcite is measured simply as a ratio of 18O to 16O of a calcite sample compared to an isotopic standard, and alone that analytical measurement does not reflect temperature. To measure temperature using δ18Ocalcite, researchers must use a calibration that incorporates information about other parts of the system that influence the inclusion of 18O into calcite, such as the δ18OH2O of the surrounding water, and any potential biological effects of calcification. Because most proxy applications do not allow the researcher to produce controlled measurements of each of those CCFs, the output from a proxy is at best an “estimate” (i.e., the δ18Ocalcite proxy measurement produces paleotemperature estimates).

Figure 1Schematic and description of an idealized calibration for a hypothetical paleotemperature proxy.


The term “indicator” is often used synonymously with proxy, or even measurement (e.g., “Application of the Ce anomaly as a paleoredox indicator”, German and Elderfield, 1990; “Using fossil leaves as paleoprecipitation indicators”, Wilf et al., 1998; “Stomatal density and stomatal index as indicators of paleoatmospheric CO2 concentrations”, Royer, 2001; “Indicator of relative changes in sea surface temperature”, Hollis et al., 2019; “Palaeoecological proxies... include crustacean Ostracoda... their indicator species... are sensitive to deoxygenation and eutrophication”, Yasuhara et al., 2019). The use of this term for such a wide range of applications highlights the lack of clarity in the existing literature, which eventually leads to a lack of clarity in the dissemination of resulting information. While all measurements do “indicate” the quality of some property, they do so in different ways and are accompanied by quite different levels of reliability and uncertainty. The proposed distinction between proxy and controlled measurements, and within proxy measurements (see below), is aimed to provide clarity to the discussion of measurements and their outputs – and CCFs provide such clarification.

The importance of CCFs for proxy measurements was recognized in the development of the first quantitative paleotemperature proxy, δ18Ocalcite. Harold Urey first described the thermodynamic relationship between δ18Ocalcite and calcite formation temperatures through a simple linear calibration that relates δ18Ocalcite to temperature in degrees Celsius (Urey, 1948). Urey discussed two important CCFs influencing the δ18Ocalcite relationship with temperature that could have changed significantly through geologic time and space, namely the δ18OH2O of the (mean) global ocean and δ18OH2O of local waters surrounding the precipitating carbonate. While the early reports posited that global δ18OH2O changed on long timescales (millions of years) as a result of rock weathering, later work showed that global δ18OH2O had varied significantly on much shorter timescales (tens of thousands of years) due to fluctuations of global ice volume (Emiliani, 1955). The uncertainty of mean ocean δ18OH2O is greater farther back in Earth history due to currently unconstrained conditions such as ancient ocean latitudinal gradient effects (i.e., reduced latitudinal temperature gradient and resultant local δ18OH2O, 100 million years ago) and silicate weathering rates (Urey et al., 1951). Most Earth systems have experienced variability through Earth's history, contributing to increased uncertainty associated with CCFs moving farther back in geologic time. As such, different temporal applications of a single proxy can dramatically change that proxy estimate's uncertainty.

The potential for unknown CCFs exists even for well-calibrated proxy systems and control measurements (Wilson and Boudinot, 2020). While the mercury thermometer successfully controls for its relevant CCFs, a hypothetical application that reveals a theretofore unknown CCF would lead us to no longer consider the thermometer a controlled measurement, at least until it were manufactured in a way to also remove the effects of that CCF. The potential for the existence of unknown CCFs necessitates cautious interpretations of all measurements, particularly those in development or under new applications. But how exactly are CCFs incorporated into proxies?

3 Assessing a proxy

3.1 Situating proxies on a spectrum

CCFs are incorporated into proxy measurements through a calibration equation (Fig. 1), which provides a quantitative representation of the relative influence of each causal factor that contributes to the measured property. Using the calibration, researchers can account for the influence of CCFs and produce an estimate of the phenomenon in question. However, the extent to which calibrations identify and address CCFs differs greatly between proxies and proxy applications.

We place proxy measurements along a spectrum that can illustrate the diversity of how proxies relate to CCFs (Fig. 2a). Controlled measurements, with all CCFs known and controlled for (e.g., mercury thermometer), occupy one end of the spectrum. On the other end of the spectrum are proxy measures that are not (yet) calibrated to directly account for their CCFs such that only a correlation is proposed (correlation-constrained proxy; see the “Glossary of terms”), carrying uncertainty regarding the nature and precise causal influence of associated CCFs. Between the two ends of the spectrum are proxies that have a calibration that accounts for the CCFs' influence on the measurement output and are accompanied by a quantitative measurement (observation-constrained proxy) or quantitative inference (inference-constrained proxy) of those CCFs (Fig. 2a; see the “Glossary of terms”). By situating any measurement along this spectrum, one can assess how much the measured value is affected by CCFs as opposed to the property in question (i.e., the potential uncertainty; Fig. 2b, see below), such as δ18OH2O instead of temperature.

Figure 2A spectrum (x axis) of observational measurements as a function of their incorporation of confounding causal factors and related uncertainty. (a) The bottom y axis describes the completeness of a measurement's calibrations (i.e., how completely a calibration accounts for all causal factors). Controlled measurements on the left have full control of all causal factors. Observation-constrained proxies have a calibration that quantitatively accounts for CCFs and allows the researcher to measure those CCFs. Inference-constrained proxies also have a calibration that quantitatively accounts for CCFs, but the researcher cannot measure the CCFs, so the quantitative values for CCFs used in the calibration must be inferred from other evidence. On the right, correlation-constrained proxies have the least direct (quantitative) control of the causal factors, with calibrations that do not quantitatively account for CCFs. (b) The top y axis represents the uncertainty of each measurement, with the red line signifying potential uncertainty and the blue bar showing the range of reported uncertainty in the literature. Because analytical uncertainty varies greatly between proxies, instruments, and users, we have excluded its representation. The arrow and description of offset in panel (a) apply to all measurements.


Controlled measurements work the same across locations and through time. A mercury thermometer should have the same level of accuracy and precision in a high-altitude, low-humidity study site as in a low-altitude, high-humidity site. Ideally, all proxy measurements would eventually develop into controlled measurements. Unfortunately, and particularly in paleo-applications, the certainty ascribed to the mercury expansion calibration is not easily attainable or validated. Furthermore, even controlled measurements can be complicated by work in “extreme” environments, where temperatures may exceed the minimum or maximum range to which the thermometer is calibrated (e.g., beyond the boiling point of mercury). Thus, how a measurement's calibration is developed and utilized determines the situations and uncertainty for that measurement's application.

To illustrate the proxy range of the spectrum, we situate δ18Ocalcite as either an observation-constrained proxy or an inference-constrained proxy depending on how CCFs are quantitatively accounted for (Fig. 2a). When the δ18OH2O value in the temperature calibration derives from an independent measurement (proxy or controlled) of the δ18OH2O of the water from which the calcite precipitated, then the proxy is an observation-constrained proxy; values to account for the CCFs in the calibration derive from empirical observations (Fig. 2a). These components of the calibration can be accounted for with information from proxy or controlled measurements, with the latter contributing less uncertainty given the constraints on CCFs in controlled measurements.

On the other hand, in instances in which δ18OH2O cannot be measured, such as in deeper-time applications, the researcher must provide an inference (i.e., reasoned approximation) of local δ18OH2O. Based on the extrapolation of a well-known system to a lesser-known system, inference-constrained proxy measurements inherently present a more biased estimate due to biases in the researchers' inference of that system rather than empirical evidence (Fig. 2b). For example, some researchers have inferred 100-million-year-old δ18OH2O for the δ18Ocalcite paleotemperature calibration by applying a first-order estimate of δ18OH2O based on certain characteristics of the system in question, such as a mean δ18OH2O value that applies to any “non-glacial world” (O'Brien et al., 2017). Researchers modified this mean value to represent the δ18OH2O of local waters (where calcite was precipitated) by adjusting the mean δ18OH2O based on modern latitudinal δ18OH2O variability (e.g., O'Brien et al., 2017). This inference is still based on quantitative measurements (e.g., modern δ18OH2O latitudinal trends) but requires several inferences that assume that two systems are similar (i.e., all ice-free oceans in Earth's history are isotopically similar; latitudinal δ18OH2O variability is similar between 100 million years ago and the present). Because that inference is accompanied by uncertainty that is not easily quantifiable (e.g., uncertainty associated with assumptions made by the researcher rather than analytical uncertainty; see below), the potential uncertainty for inference-constrained proxies is larger than those that are observation-constrained.

Importantly, many calibrations require a combination of inference and observation to produce a final estimate of the target property, as CCFs differ in how they can be accounted for. In other words, many proxy applications use both observation and inference constraints to satisfy a calibration.

Moving further away from controlled measurements on our spectrum, we find proxy measurements that are correlated with temperature, but the CCFs are not fully or quantitatively accounted for in a calibration; here, the CCFs are unknown (or roughly understood), though a corollary relationship is identified. It is functionally impossible to accurately assess the uncertainty of estimates produced by these measurements (Fig. 2b), as the causal factors influencing the measurement are not quantitatively represented in a calibration. Not only could the signal from such a correlation-constrained proxy be partially driven by some unknown CCF; it could even be entirely driven by CCFs (e.g., Junium et al., 2018) but would be interpreted as driven by the property in question.

An example of a correlation-constrained proxy is the present incarnation of the TEX86 paleotemperature proxy. In 2002, researchers identified a suite of sedimentary hydrocarbons that shared a similar structure but contained a different number of cyclic moieties (Schouten et al., 2002; Fig. 2). Relative abundances of these isoprenoidal glycerol diether glycerol tetraether (isoGDGT) compounds with different cyclic moieties were represented by a ratio (Table 1). When these compounds were recovered from modern sediments and this ratio was calculated, a clear correlation with the surface water temperature at the sample location was identified. In other words, the number of cyclic moieties in the sedimentary isoGDGTs was correlated with the surface water temperatures at the location where they were found. Using statistical (regression) analyses of a suite of modern sediments and sea surface temperature measurements, a calibration was produced, and the authors proposed this molecular ratio as a quantitative paleotemperature proxy (Schouten et al., 2002). A physiological response was posited to explain the relationship – fewer cyclic moieties contributed to a more malleable lipid membrane, which would be advantageous in cooler waters.

Table 1Compilation of TEX86 calculations and calibrations as of 2020. Modified from Tierney (2012).

Download Print Version | Download XLSX

In the ensuing years, several questions about the origin and implications of these molecules were raised. They seemed to be produced predominantly by Thaumarchaeota, a type of marine archaea that live well below the sea surface (Schouten et al., 2000) where the temperature correlation was strongest. Additionally, field and culture observations from variable environments produced different calibrations (i.e., different slopes and y intercepts to describe the correlation between the isoGDGT ratio and temperature; Table 1) and even different ratios (e.g., TEX86L for low-temperature regions; Table 1). If the ratio of isoGDGT cyclicity directly represented temperature, then why would that ratio be different depending on the study design, location, and time period? And if the calibration accurately accounted for the CCFs contributing to the effect of temperature on isoGDGT cyclicity, why would it be different from place to place?

These questions are driving fundamental research in understanding the mechanistic relationships between TEX86 and temperature. Several important advances in this mechanistic understanding have already been produced: culture and field experiments have shown that the cyclic moieties represent a metabolic response to energy demands, growth phase, nutrient availability, and ecosystem composition rather than solely a physiological response to temperature (Elling et al., 2014; Qin et al., 2015; Hurley et al., 2016; Polik et al., 2018). These studies advance TEX86 beyond the corollary relationship (i.e., colder temperatures makes more cyclic moieties) into a nuanced, yet more accurately representative, understanding of all causal factors and their mechanisms (i.e., relationship between sea surface temperatures and nutrient and oxygen availability, which impacts archaeal metabolic energy demands). However, while work on TEX86 drivers suggests that non-temperature factors cause variations in isoGDGT cyclization, TEX86 application studies continue to report a specific temperature value. The argument behind continued TEX86 applications is the correlation of ammonia oxidation rates and temperature in most modern settings (Hurley et al., 2016). However, many studies have suggested that ammonia or oxygen concentrations in past environments likely varied in a way that did not correlate with temperature (e.g., Liu et al., 2009; Polik et al., 2018). This proxy's CCFs need full consideration in experimental design and interpretation for it to be truly quantitative and its uncertainty appropriately reported.

3.2 Discussing proxy data

A clear distinction should be made between various forms and degrees of uncertainty related to proxy measurements (see the “Glossary of terms”). All proxy measurements are the result of some analysis (e.g., δ18Ocalcite as the normalized ratio of 18O to 16O in a sample) and incorporation into a calibration (e.g., δ18Ocalcite as a function of temperature, δ18OH2O, and biological effects; Fig. 1), from which three forms of uncertainty derive. The first is analytical uncertainty, which is simply the uncertainty associated with the precision and accuracy of the analytical measurement. For oxygen isotopes in calcite, this would include the isotope ratio mass spectrometer's precision and accuracy when determining the ratio of 18O to 16O of a sample normalized to a standard. We argue that analytical uncertainty can always be quantified using standards and is distinct from unquantifiable uncertainties. Unquantifiable uncertainties associated with calibration (including unknown CCFs), as well as sample preparation and analysis, and are grouped into potential uncertainties (Fig. 2b). The distinction between factors that fall into potential versus analytical uncertainty is defined by quantitation. Researchers take many steps to quantify errors and uncertainties associated with sample preparation and analysis. When employed, such efforts reduce the potential uncertainty and more accurately reflect that analytical uncertainty. For example, hydrocarbon standards might be incorporated into a sedimentary sample before hydrocarbon extraction such that the researcher can quantify if any hydrocarbons, including isoGDGTs, are lost or altered throughout the in-lab processing. Researchers could report or normalize to that loss and alteration, more transparently reflecting the uncertainty in the analysis. However, some potential uncertainties will always exist in a nonquantifiable manner, such as unknown CCFs or unmeasurable changes in CCFs through time. Because the error in an inference-constrained proxy might not be quantifiable (i.e., logical deductions might not have a quantifiable uncertainty), its potential uncertainty will always be higher than an observation-constrained proxy, for which the analytical uncertainty of the CCF measurement can be quantified (Fig. 2b).

The final type of uncertainty is the reported uncertainty, which should ideally cover (either quantitatively or in discussion) both analytical and potential uncertainties. However, for many proxies, the reported uncertainty varies widely in practice. For example, the variety of isoGDGT ratios and calibrations (Table 1), and the lack of codified reporting standards used in the expression of TEX86-derived paleotemperatures, leads to notable variability in the reported uncertainty associated with TEX86. Some TEX86-derived paleotemperature estimates are plotted without error bars and are accompanied by an in-text discussion of the analytical uncertainty from calibration and replicate analyses (e.g., Woelders et al., 2017), while the analytical uncertainty for others is not discussed (e.g., Slujis et al., 2006). For some estimates, the analytical uncertainty derived from only the calibration is provided (e.g., Hollis et al., 2012; Ho et al., 2014). Analytical uncertainties from replicate analyses have been combined with the analytical uncertainties of calibration statistics as error windows on plots (e.g., Tierney et al., 2010; Shevenell et al., 2011), while discussion of potential uncertainties, such as changes in the known (but not calibrated-to) CCFs, varies greatly between reports (e.g., Tierney et al., 2010; Shevenell et al., 2011). Because potential uncertainty is by definition unquantifiable, it might not be incorporated into quantitative data presentation styles, such as Cartesian plots, but can certainly be discussed in light of the existing work on TEX86 CCFs.

Importantly, researchers have taken steps to communicate the reliability of proxy data relative to other measurements in reviews, conference sessions, and proxy assessment compilations (e.g., Ravelo and Hillaire-Marcel, 2007; Newman et al., 2016; Hollis et al., 2019; Wilson and Boudinot, 2019). For example, the Paleoclimate Modelling Intercomparison Project (PMIP) appraisal of proxy data for the Intergovernmental Panel on Climate Change (IPCC) reports (Hollis et al., 2019) provides an in-depth description of the paleotemperature proxies used to inform the IPCC reports. The appraisal describes each proxy's theoretical background, which gives data generators and modelers a better understanding of the biogeochemical processes that relate each proxy to temperature. The assessment then describes strengths and weaknesses of each proxy relative to the other measurements, which can guide users in determining which proxy may be best suited for a given study, as well as providing considerations for the interpretation of the resulting data. Finally, the assessment provides “recommended methodologies”, which includes analytical recommendations, a single recommended calibration, and other best practices for reporting proxy data and interpretations. By providing a consensus presentation of recommended methodologies, the PMIP proxy assessment and similar projects constitute an important means for standardizing data assessment and reporting, as well as guiding proxy users in developing study designs. The framework presented here will improve those methods by providing direct language (e.g., CCFs, types of uncertainty) to more clearly navigate discussions of proxy assessments.

A complete outline of potential uncertainties and the often complex phenomena–measurement relationships is difficult to incorporate into grants, peer-reviewed manuscripts, and educational programs. The lack of extensive discussion of a proxy's uncertainty can lead to an oversimplification of these relationships (i.e., an under-consideration for CCFs and uncertainties). However, detailing how proxies might relate to some unknown CCFs (as is done here) can make any proxy seem subject to countless unknown CCFs, which may engender an unwarranted dismissal of proxy data interpretations. Because proxy data inform models, manuscripts, and educational lessons, there needs to be a more universally accepted and functional means of discussing and conveying proxy uncertainty that is honest yet robust. Our spectrum of proxy measurements relates measurements to their CCFs, and thus the spectrum and language provide such a means of conveying uncertainty in a universal way.

Many studies, for example, have shown that TEX86 trends were driven by changes in nitrogen availability and marine ecology in some paleo-environments (Liu et al., 2009; Hurley et al., 2016; Junium et al., 2018; Polik et al., 2018). How can researchers be sure that TEX86 is not driven by these dynamics in other settings, unless those CCFs of nitrogen availability and marine ecology changes are directly assessed? Because uncertainties in estimating these environmental characteristics are often not incorporated (as they are not incorporated in the current litany of quantitative TEX86 calibrations; Table 1), we have described the potential uncertainty of TEX86 (and other correlation-constrained proxies) as much higher than is often reported (Fig. 2b). By referring to TEX86 as a correlation-constrained proxy, modelers, reviewers, and researchers can immediately be aware of this underreporting of uncertainty, which would inform their interpretation of the temperature estimates produced by TEX86 in a meaningful yet succinct way.

3.3 Development of a proxy

Proxy development is the production and improvement of a calibration that quantitatively accounts for all CCFs that contribute to the measured signal. The controlled characteristic of a mercury thermometer allows the measurement of temperature without needing an external calibration, as the temperature lines are calibrated to the exact expansion of mercury within the glass walls. Prior to the full calibration of the lines on the mercury thermometer, mercury might have served as a proxy: a gram of mercury on a table would expand and contract with fluctuating temperatures, which could be a qualitative, correlation-constrained proxy for temperature (the mercury expanded, so the temperature likely got hotter).

Because proxy measurements do not account for the influence of all known CCFs, quantitative proxy measurements require some external calibration equation to produce reliable estimates. Calibrations express the relative effect of each causal factor (Fig. 1) and provide insight into the applicability of a proxy by addressing the range in which the calibration is useful and the natural variability (uncertainty) associated with that calibration. Proxy applications are limited to the range in which that proxy has been studied and calibrated; applications outside that range do not produce reliable estimates.

Harold Urey's first description of the thermodynamic relationship between δ18Ocalcite and calcite formation temperatures was simply “The calculated slope, 4.4 per mil between 0 and 25 C” (Urey, 1948). More complex calibrations now exist for the δ18Ocalcite paleotemperature proxy, which accounts for its numerous CCFs including δ18OH2O and biological effects (Ravelo and Hillaire-Marcel, 2007; Hollis et al., 2019). While the δ18Ocalcite proxy is far from a controlled measurement, its historical development exemplifies the consistent work to make proxies more like controlled measurements, i.e., to eliminate or limit the influence of CCFs. But what does such proxy development look like in practice?

The first step of proxy development is the identification of some corollary relationship between a measurable property (e.g., δ18O of calcite) and a property unable to be measured in a controlled fashion (e.g., temperature of a past environment). At first order, these are usually qualitative and based on some hypothesis to describe a system. Mercury expands with increasing temperature due to general fluid dynamics; 18O is more favorably incorporated into calcite at lower temperatures due to differences in vibrational energies between 18O and 16O; some organisms alter their cell membranes to maintain homeostasis in variable environments.

Proxies that are based on such a corollary relationship can serve as qualitative proxy measures, which provide useful comparative or relative information. This is the case for some paleotemperature proxies: geological evidence of glacial expansion and retreat in a certain location can indicate relative local temperature change, but variability in numerous (difficult or impossible to constrain) CCFs prohibits a calibration to quantitative temperature changes in degrees Celsius. Such comparative information is appropriate for many paleo-studies, wherein the question is focused on trends and relative changes through time or differences between sites. This corollary relationship can lead researchers into an “optimism phase”, wherein the assumption of a direct cause–effect relationship between a phenomenon and an observation makes users optimistic that a proxy can be used with confidence (Elderfield, 2002).

If researchers aim to use a proxy quantitatively, the relationship between the target property (e.g., temperature), the observable property (e.g., δ18Ocalcite), and all CCFs must be accounted for in a calibration (Fig. 1). Quantitative proxies require an (empirically derived) estimation or (logically deduced) inference of the influence of all CCFs represented in a calibration. Calcite precipitation experiments with variable pH, δ18OH2O, salinity, and biomineralizing organisms have contributed to calibrations that include those CCFs and represent how they contribute to 18O incorporation into calcite (Ravelo and Hillaire-Marcel, 2007). Studies using those calibrations must account for those CCFs. For example, calcite-producing organisms live in either bottom waters or surface waters – the temperature from the two will not only have slightly different CCFs, but will also reflect temperature from different parts of the water column. Researchers would identify the type of organism to know where it lived and would address the CCFs specific to that organism (e.g., Bemis et al., 1998). The process of testing CCFs must be extensive to provide confidence in the proxy. Often, this phase of development unearths unforeseen CCFs, such as the role of water-column oxygenation in isoGDGT cyclicity (Qin et al., 2015; Hurley et al., 2016). While some have argued that this can lead to a “pessimism phase”, wherein proxy users might no longer have confidence in that proxy's utility (Elderfield, 2002), in fact these revelations are essential to proxy development – it is the scientific method at work, and such exhaustive testing of CCFs is a prerequisite for the confident use of a proxy.

The identification and testing of CCFs represent an inherently iterative processes. Urey and others made serious consideration of CCFs before applying the δ18Ocalcite paleotemperature proxy. It was proposed that the proxy be used only “if the isotopic composition of the water is known not to differ from the mean of the present seas, or... in the case that it does [differ], if both the isotopic composition of the carbonate and water are determined” (Urey et al., 1951). Urey described local variability in δ18OH2O due to evaporation and salinity as “the greatest difficulty” for accurate temperature measurements but promised that “this problem is being studied from several angles and it is hoped that corrections can be applied in the future” (Urey et al., 1951). Urey's careful consideration of CCFs, and the subsequent and ongoing investigations into those CCFs, serves as an exemplar for proxy discussion, interpretation, and development.

Sometimes, the development of one proxy can constrain a CCF for another proxy by providing a new means of estimating that CCF. The development of the Mg∕Ca paleotemperature proxy, based on the incorporation of magnesium relative to calcium in foraminiferal calcite, provided an independent constraint on temperature at the same time (i.e., mid-1990s) that δ18Ocalcite was being developed as a paleotemperature proxy (Hastings et al., 1998). By using Mg∕Ca to estimate temperature in the same setting as δ18Ocalcite, researchers were able to independently constrain temperature and thus use δ18Ocalcite to estimate δ18OH2O (Mashiotta et al., 1999). The development of two independent paleothermometers, each with their own CCFs, provided researchers with new opportunities and greater confidence in applying those proxies; δ18Ocalcite and Mg∕Ca combined helped to identify the degree to which δ18OH2O influenced the δ18Ocalcite proxy and resulted in a new means to constrain the CCF of δ18OH2O for future studies. Similarly, multiple studies have compared temperature estimates from TEX86 with other organic (e.g., alkenones; Huguet et al., 2006; Lee et all., 2008; Li et al., 2013) and inorganic (e.g., Mg∕Ca and δ18Ocalcite; e.g., Hollis et al., 2012; Hetzberg et al., 2016; O'Brien et al., 2017) proxies in the same settings. While those multi-proxy comparative studies are helping to identify CCFs related to TEX86 and other paleotemperature proxies, the numerous unconstrained CCFs related to TEX86 make direct testing of CCFs difficult for even those comparative studies. For example, are deviations between δ18Ocalcite and TEX86 due to depth of production in the water column (e.g., Li et al., 2013; Hetzberg et al., 2016), production season (Huguet et al., 2006), or some other CCF like nutrient availability (Hurley et al., 2016)? Some TEX86 applications have used independent proxies to constrain CCFs related to the environment, such as the use of the BIT index (Hopmans et al., 2004) to estimate changes in the input of isoGDGTs from nonmarine sources (e.g., Weijers et al., 2006; Hollis et al., 2012). Future work integrating the physiological CCFs associated with TEX86, such as changes in water-column oxygenation (Qin et al., 2015) and nutrient availability (Hurley et al., 2016), into such multi-proxy comparisons could further constrain the role of different CCFs in TEX86 paleotemperature estimates.

Alternatively, the use of statistical methods can elucidate CCFs and their impact on proxy measurements. One example is the Bayesian statistical modeling approach, which uses existing data (usually field-produced calibrations) over a wide range of environments to produce a “best-fit” calibration for the range of values measured in a given study. The resulting model allows researchers to identify which environments and/or locations produce a calibration that best fits their data and thus provides a means to investigate environmental conditions and the related CCFs that more fully express the relationship between, for example, TEX86 and temperature (Tierney and Tingley, 2014). In fact, the PMIP proxy assessment (Hollis et al., 2019) recommends that TEX86 users utilize the Bayesian calibration fit as the best current means to estimate paleotemperatures (Hollis et al., 2019), demonstrating how the field may use these statistical methods to provide best practices for measurement applications. Similarly, stochastic modeling approaches are used in hydrological data interpretations as a means to estimate the partial effects (or confounding effects) of different causal factors contributing to a given signal (Yevjevich, 1987), and such approaches could be utilized by the paleotemperature community.

Additionally, the application of transfer functions, including proxy system models, is used to make inferences about CCFs. Transfer functions provide a theoretical (rather than empirical) constraint on a system's properties in an attempt to predict the quality of properties rather than observe them (Telford and Birks, 2005). While the reliability of transfer functions is an area of active discussion (e.g., Telford et al., 2004, 2013), transfer functions represent yet another statistical approach used to account for CCFs in lieu of empirical observations and are employed by some to reduce uncertainty for correlation- and inference-constrained proxies. For example, proxy system models use transfer functions to provide an assessment of proxy–phenomenon relationships and the driving mechanisms behind proxy measurement outputs (e.g., Dee et al., 2016, 2018; Okazaki and Yoshimura, 2019). These statistical methods are an important aid in the determination of CCFs on observational signals and can be powerful in the development of proxy calibrations.

Ultimately, a mix of variable-controlled laboratory experiments, statistical analyses, and field validation experiments all contribute to proxy development. The identification and expression of corollary relationships in a statistical regression represent only the first step. Comparisons between laboratory (e.g., culture) experiments and field measurements might produce different calibrations; causes for differences in the regression should be investigated. For TEX86, the recognition of significant variability amongst field calibrations led researchers to investigate non-temperature properties, such as physiological effects of Thaumarchaeota, in variable-controlled in-laboratory culture experiments (e.g., Elling et al., 2014; Qin et al., 2015; Hurley et al., 2016). In response, field studies of isoGDGT cyclization were performed in modern and paleo-settings (e.g., Hurley et al., 2016; Junium et al., 2018; Polik et al., 2018) and compared with those CCFs identified in culture experiments. These studies together suggest that TEX86 users should aim to measure changes in water-column oxygenation, ammonia availability, and ecosystem structure and incorporate those measurements quantitatively into a calibration to develop TEX86 as an observation-constrained proxy. Unfortunately, the current limitation (and area of most research) concerns the production of a calibration that accurately reflects all CCFs (Table 1). Many researchers have moved forward with applying TEX86 in paleo-studies, providing an in-text inference of some CCFs often with the conclusion that the CCFs do not affect the temperature estimate (e.g., O'Brien et al., 2017), or independently measuring a select number of CCFs (such as changes in the input of isoGDGTs using the BIT index; e.g., Weijers et al., 2006). The lack of a unifying calibration that quantitatively accounts for those CCFs implies that these applications exemplify correlation-constrained proxy measurements, and the associated reported uncertainty should aim to reflect the accompanying potential uncertainties (Fig. 2b).

Because an ideal calibration reflects all contributing pieces of a system (Fig. 1), a single calibration is necessary for a proxy to be reliably quantitative. It should be verifiable and applicable in a wide variety of locations, times, and situations. If the calibration is inadequate for some situation, then the calibration does not account for all potential CCFs. We consider these calibrations incomplete; for some systems, the unknown CCF does not change, and the calibration explains the corollary relationship, but for other systems, the unknown CCF is introduced or changes such that the calibration no longer adequately represents the relationship between the measured entity and the property in question. This is the state of current TEX86 – each different calibration purports a different quantitative description of the relationship between causal factors (e.g., temperature) and isoGDGT cyclicity (Table 1), and none quantitatively account for CCFs (Table 1; Fig. 2a). Ongoing work to better constrain what CCFs are at play, and how they can be quantified, can move TEX86 towards a more observation- or inference-constrained proxy and lead to more reliable TEX86 paleotemperature estimates.

While we use TEX86 as an exemplar here, we recognize that limitations in quantitative proxy development and calibration exist across all fields of study, particularly in the Earth sciences. Not all proxies need be quantitative, and all quantitative proxies present uncertainty. But for a measurement to be most effective (broad applications, less uncertainty), it should be developed as close to a controlled measurement as possible. This means developing a causal, mechanistic understanding of the relevant system (i.e., a single calibration) as a means to adequately control for the influence of CCFs and produce reliable proxy estimates.

4 Conclusions

The distinction between controlled and proxy measurements, and within proxy measurements, serves a more functional role for interpreting, assessing, and developing proxies than previous distinctions between proxy and “direct” measurements. The language proposed here concerning proxy calibrations (e.g., observation- versus inference-constrained proxy) and uncertainty (e.g., analytical versus potential) succinctly and directly addresses the relationship between measurements and the property they intend to describe and more clearly directs proxy calibration development. Using this language, modelers can more confidently appropriate proxy data outputs into their models, researchers can more efficiently design studies to produce robust measurements, reviewers can more easily assess the reporting of uncertainty and interpretations, and educators can more clearly convey the differences in measurements available for students to learn from, apply, and improve. Readers may find that observational measurements not typically considered proxy measurements in their field may in fact fall on the proxy end of our spectrum. We hope that such realizations might drive researchers to investigate what has been taken for granted in previous interpretations or how future study designs can more accurately assess and account for CCFs. Ultimately, we propose that as much can be learned about a system by developing a proxy as can be learned by applying it.

Appendix A: Glossary of terms
Confounding causal factors (CCFs) Characteristics of an environment that affect the output of a measurement but are not the property being measured
Controlled measurement Measurement that has been manufactured or designed to eliminate the potential effects of all known CCFs on the measurement output
Proxy measurement Measurement that does not eliminate the influence of all known CCFs on the intended or targeted property
Observation-constrained proxy Proxy measurement for which the CCFs are quantitatively incorporated into a calibration and are accounted for with values produced by other proxy measurement estimates or controlled measurements
Inference-constrained proxy Proxy measurement for which the CCFs are quantitatively incorporated into a calibration and are qualitatively accounted for using a reasoned approximation (inference) of the value based on comparisons to similar systems, rather than values produced by measurements of the system in question
Correlation-constrained proxy Proxy measurement that does not account for known CCFs but is based on a hypothesized relationship between a certain property and a measurement output; uses a calibration that does not quantitatively represent the causal structure of the system
Analyticaluncertainty The uncertainty associated with the precision and accuracy of the analytical instrument
Potential uncertainty The degree to which the measurement or estimated value is affected by something other than the property being measured
Reported uncertainty A textual and/or numerical representation of the combined analytical and potential uncertainties associated with a measurement
Data availability

All data described here are presented in previously published literature and are cited as such.

Author contributions

JW and FGB designed the study. FGB wrote the paper, and JW and FGB edited the paper.

Competing interests

The authors declare that they have no conflict of interest.


We thank the Center for the Study of Origins for their support. We thank Thomas Marchitto, Gifford Miller, Benjamin Johnson, Matthew Huber, and F. Douglas Boudinot for their helpful comments that improved the paper and Howard Spero, Julio Sepúlveda, Carol Cleland for discussions that improved the paper. F. Garrett Boudinot acknowledges the Department of Geological Sciences at the University of Colorado Boulder (NSF Division of Earth Sciences Earth–Life Transitions – ELT – program grant no. 1338318) and the American Chemical Society Petroleum Research Fund (ACS-PRF; Doctoral New Investigator Award no. 58815-DNI2) for their support. Joseph Wilson acknowledges the Department of Philosophy and the Graduate School at the University of Colorado Boulder for their support.

Review statement

This paper was edited by Alberto Reyes and reviewed by Julie Griffin and one anonymous referee.


Bemis, B. E., Spero, H. J., Bijma, J., and Lea, D. W.: Reevaluation of the oxygen isotopic composition of planktonic foraminifera: Experimental results and revised paleotemperature equations, Paleoceanography, 13, 150–160,, 1998. 

Chang, H: Inventing temperature: Measurement and scientific progress, Oxford University Press, Oxford, United Kingdom, 2004. 

Dee, S. G., Steiger, N. J., Emile-Geay, J., and Hakim, G. J.: On the utility of proxy system models for estimating climate states over the common era, J. Adv. Model. Earth Syst., 8, 1164–1179,, 2016. 

Dee, S. G., Russell, J. M., Morrill, C., Chen, Z., and Neary, A.: PRYSM V2.0: A proxy system model for lacustrine archives, Paleoceanogr. Paleoclim., 33, 1250–1269,, 2018. 

Elderfield, H.: Foraminiferal Mg/Ca paleothermometry: expected advances and unexpected consequences, Goldschmidt Conference, Davos, Switzerland, 17–23 August, Geochmim. Cosmochim. Acta, 66, A213, 2002. 

Elling, F. J., Könneke, M., Lipp, J. S., Becker, K. W., Gagen, E. J., and Hinrichs, K.-U.: Effects of growth phase on the membrane lipid composition of the thaumarchaeon Nitrosopumilus maritimus and their implications for archaeal lipid distributions in the marine environment, Geochim. Cosmochim. Ac., 141, 579–597,, 2014. 

Emiliani, C.: Pleistocene temperatures, The J. Geol., 63, 538–578,, 1955. 

Franklin, A.: Experiment, right or wrong. Cambridge University Press, Cambridge, United Kingdom, 1990. 

German, C. R. and Elderfield, H.: Application of the Ce Anomaly as a paleoredox indicator: the ground rules, Paleoceanography, 5, 823–0833,, 1990. 

Harning, D., Andrews, J. T., Belt, S. T., Babedo-Sanz, P., Geirsdottir, A., Dildar, N., Miller, G. H., and Sepúlveda, J.: Sea ice control on winter subsurface temperatures of the north Iceland shelf during the Little Ice Age: a TEX86 calibration case study, Paleoceanogr. Paleoclim., 34, 1–16,, 2019. 

Hastings, D. W., Russell, A. D., and Emerson, S. R.: Foraminiferal magnesium in Globeriginoides sacculifer as a paleotemperature proxy, Paleoceanography, 13, 161–169,, 1998. 

Hetzberg, J. E., Schmidt, M. W., Bianchi, T. S., Smith, R. W., Shields, M. R., and Marcantonio, F.: Comparison of eastern tropical Pacific TEX86 and Globigerinoides ruber Mg/Ca derived sea surface temperature: Insights from the Holocene and Last Glacial Maximum, Earth Planet. Sci. Lett., 434, 320–332,, 2016. 

Hollis, C. J., Taylor, K. W. R., Handley, L., Pancost, R. D., Huber, M., Creech, J. B., Hines, B. R., Crouch, E. M., Morgans, H. E. G., Crampon, J. S., Gibbs, S., Pearson, P. N., and Zachos, J. C.: Early Paleogene temperature history of the Southwest Pacific Ocean: Reconciling proxies and models, Earth Planet. Sci. Lett., 349–350, 53–66,, 2012. 

Hollis, C. J., Dunkley Jones, T., Anagnostou, E., Bijl, P. K., Cramwinckel, M. J., Cui, Y., Dickens, G. R., Edgar, K. M., Eley, Y., Evans, D., Foster, G. L., Frieling, J., Inglis, G. N., Kennedy, E. M., Kozdon, R., Lauretano, V., Lear, C. H., Littler, K., Lourens, L., Meckler, A. N., Naafs, B. D. A., Pälike, H., Pancost, R. D., Pearson, P. N., Röhl, U., Royer, D. L., Salzmann, U., Schubert, B. A., Seebeck, H., Sluijs, A., Speijer, R. P., Stassen, P., Tierney, J., Tripati, A., Wade, B., Westerhold, T., Witkowski, C., Zachos, J. C., Zhang, Y. G., Huber, M., and Lunt, D. J.: The DeepMIP contribution to PMIP4: methodologies for selection, compilation and analysis of latest Paleocene and early Eocene climate proxy data, incorporating version 0.1 of the DeepMIP database, Geosci. Model Dev., 12, 3149–3206,, 2019. 

Huguet, C., Kim, J.-H., Sinninghe Damsté, J. S., and Schouten, S.: Reconstruction of sea surface temperature variations in the Arabian Sea over the last 23 kyr using organic proxies (TEX86 and U37K'), Paleoceanography, 21, PA3003,, 2006. 

Hurley, S. J., Elling, F. J., Konneke, M., Buchwald, C., Wankel, S. D., Santoro, A. E., Lipp, J. S., Hinrichs, K.-U., and Pearson, A.: Influence of ammonia oxidation rate on thaumarchaeal lipid composition and the TEX86 temperature proxy, P. Natl. Acad. Sci. USA, 113, 7762–7767,, 2016. 

Jansen, E., Overpeck, J., Briffa, K. R., Duplessy, J.-C., Joos, F., Masson-Delmotte, V., Olago, D., Otto-Bliesner, B., Peltier, W. R., Rahmstorf, S., Ramesh, R., Raynaud, D., Rind, D., Solomina, O., Villalba, R., and Zhang, D.: Palaeoclimate, in: Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, edited by: Solomon, S., Qin, D., Manning, M., Chen, Z., Marquis, M., Averyt, K. B., Tignor, M., and Miller, H. L., Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 2007. 

Junium, C. K., Meyers, S. R., and Arthur, M. A.: Nitrogen cycle dynamics in the Late Cretaceous greenhouse, Earth Planet. Sci. Lett., 481, 404–411,, 2018. 

Kim, J.-H., Schouten, S., Hopmans, E. C., Donner, B., and Sinninghe Damsté, J. S.: Global sediment core-top calibration of the TEX86 paleothermometer in the ocean, Geochim. Cosmochim. Acta, 72, 1154–1173,, 2008. 

Kim, J.-H., van der Meer, J., Schouten, S., Helmke, P., Willmott, V., Sangiorgi, F., Koc, N., Hopmans, E. C., and Sinninghe Damsté, J. S.: New indices and calibrations derived from the distribution of crenarchaeal isoprenoid tetraether lipids: Implications for past sea surface temperature reconstructions, Geochim. Cosmochim. Acta, 74, 4639–4654,, 2010. 

Lee, K. E., Kim, J.-H., Wilke, I., Helmke, P., and Schouten, S.: A study of the alkenone, TEX86, and planktonic foraminifera in the Benguela Upwelling System: Implications for past sea surface temperature estimates, Geochem. Geophys. Geosyst., 9, Q10019,, 2008. 

Li, D., Zhao, M., Tian, J., and Li., L.: Comparison and implication of TEX86 and U37K' temperature records over the last 356 kyr of ODP Site 1147 from the northern South China Sea, Palaeogeogr. Palaeoclim. Palaeoecol., 376, 213–223,, 2013. 

Liu, Z., Pagani, M., Zinniker, D., DeConto, R., Huber, M., Brinkhuis, H., Shah, S. R., Leckie, R. M., and Pearson, A.: Global cooling during the Eocene-Oligocene climate transition, Science, 233, 1187–1190,, 2009. 

Mashiotta, T. A., Lea, D. W., and Spero, H. J.: Glacial-interglacial changes in Subantarctic sea surface temperature and δ18O-water using foraminiferal Mg, Earth Planet. Sci. Lett., 170, 417–432,, 1999. 

Newman, D. K., Neubauer, C., Ricci, J. N., Wu, C.-H., and Pearson, A.: Cellular and molecular biological approaches to interpreting ancient biomarkers, Annu. Rev. Earth Planet. Sci., 44, 493–522,, 2016. 

NOAA National Centers for Environmental Information: “What are proxy data?”, available at: (last access: 1 February 2020), 2020. 

O'Brien, C. L., Robinson, S. A., Pancost, R. D., Sinninghe Damsté, J. S., Schouten, S., Lunt, D. J., Alsenz, H., Bornemann, A., Bottini, C., Brassell, S. C., Farnsworth, A., Forster, A., Huber, B. T., Inglis, G. N., Jenkyns, H. C., Linnert, C., Littler, K., Markwick, P., McAnena, A., Mutterlose, J., Naafs, B. D. A., Püttmann, W., Sluijs, A., van Helmond, N. A. G. M., Vellekoop, J., Wagner, T., and Wrobel, N. E.: Cretaceous sea-surface temperature evolution: Constraints from TEX86 and planktonic foraminiferal oxygen isotopes, Earth-Sci. Rev., 172, 224–247,, 2017. 

Okazaki, A. and Yoshimura, K.: Global evaluation of proxy system models for stable water isotopes with realistic atmospheric forcing, JGR Atmos., 124, 8972–8993,, 2019. 

Polik, C. A., Elling, F. J., and Pearson, A.: Impacts of Paleoecology on the TEX86 sea surface temperature proxy in the Pliocene-Pleistocene Mediterranean Sea, Paleoceanography and Paleoclimatology, 33, 1472–1489, 2018. 

Powers, L., Werne, J. P., Vanderwoude, A. J., Sinninghe Damsté, J. S., Hopmans, E. C., and Schouten, S.: Applicability and calibration of the TEX86 paleothermometer in lakes, Organ. Geochem., 41, 404–413,, 2010. 

Qin, W., Carlson, L. T., Armbrust, E. V., Devol, A. H., Moffett, J. W., Stahl, D. A., and Ingalls, A. E.: Confounding effects of oxygen and temperature on the TEX86 signature of marine Thaumarchaeota, P. Natl. Acad. Sci. USA, 112, 10979–10984,, 2015. 

Ravelo, A. C. and Hillaire-Marcel, C.: The use of oxygen and carbon isotopes of foraminifera in paleoceanography, in: Developments in Marine Geology, 1, 735–764,, 2007. 

Royer, D. L.: Stomatal density and stomatal index as indicators of paleoatmospheric CO2 concentrations, Rev. Palaeobot. Palynol., 114, 1–28,, 2001. 

Ruddiman, W. F.: Earth's climate: Past and Future, second edition, W. H. Freeman and Company, New York, USA, 2008. 

Schouten, S., Hopmans, E. C., Pancost, R. D., and Sinninghe Damsté, J. S.: Widespread occurrence of structurally diverse tetraether membrane lipids: Evidence for the ubiquitous presence of low-tempearture relatives of hyperthermophiles, P. Natl. Acad. Sci. USA, 97, 14421–14426, 2000. 

Schouten, S., Hopmans, E. C., SchefußE., and Sinninghe Damsté, J. S.: Distributional variations in marine crenarchaeotal membrane lipids: a new tool for reconstructing ancient sea water temperatures?, Earth Planet. Sci. Lett., 204, 265–274,, 2002. 

Schouten, S., Hopmans, E. C., Forster, A., van Breugel, Y., Kuypers, M. M. M., and Sinninghe Damsté, J. S.: Extremely high sea-surface temperatures at low latitudes during the middle Cretaceous as revealed by archaeal membrane lipids, Geology, 31, 1069–1072,, 2003. 

Shevenell, A. E., Ingalls, A. E., Domack, E. W., and Kelly, C.: Holocene Southern Ocean surface temperature variability west of the Antarctic Peninsula, Nature, 470, 250–254,, 2011. 

Slujis, A., Schouten, S., Pagani, M., Woltering, M., Brinkhuis, H., Sinninghe Damsté, J. S., Dickens, G. R., Huber, M., Reichart, G.-J., Stein, R., Matthiessen, J., Lourens, L. J., Pedentchouk, N., Backman, J., Moran, K., and the Expedition 302 Scientists: Subtropical Arctic Ocean temperatures during the Palaeocene/Eocene thermal maximum, Nature, 441, 610–613,, 2006. 

Spero, H. J., Bijma, J., Lea, D. W., and Bemis, B. E.: Effect of seawater carbonate concentration on foraminiferal carbon and oxygen isotopes, Nature, 390, 497–500,, 1997. 

Suppes, P.: A set of independent axioms for extensive quantities, Portugaliae Mathematica, 10, 163–172,, 1951 

Telford, R. J. and Birks, H. J. B.: The secret assumption of transfer functions: problems with spatial autocorrelation in evaluating model performance, Quaternary Sci. Rev., 24, 2173–2179,, 2005. 

Telford, R. J., Andersson, R., Birks, H. J. B., and Juggins, S.: Biases in the estimation of transfer function prediction errors, Paleoceanography, 19, PA4014,, 2004. 

Telford, R. J., Li, C., and Kucera, M.: Mismatch between the depth habitat of planktonic foraminifera and the calibration depth of SST transfer functions may bias reconstructions, Clim. Past, 9, 859–870,, 2013. 

Tierney, J. E.: GDGT thermometry: Lipid tools for reconstructing paleotemperatures, in: Reconstructing Earth's Deep-Time Climate – The State of the Art in 2012, Paaleontological Society Short Course, The Paleontological Society Papers, 18, edited by: Ivany, L. V. and Huber, B. T., 115–131,, 2012. 

Tierney, J. E. and Tingley, M. P.: A Bayesian, spatially-varying calibration model for the TEX86 proxy, Geochmim. Cosmochim. Acta, 127, 83–106,, 2014. 

Tierney, J. E., Mayes, M. T., Meyer, N., Johnson, C., Swarzenski, P. W., Cohen, A. S., and Russell, J. M.: Late-twentieth-century warming in Lake Tanganyika unprecedented since AD 500, Nat. Geosci., 3, 422–425,, 2010. 

Trommer, G., Siccha, M., van der Meer, M. T. J., Schouten, S., Sinninghe Damsté, J. A., Schulz, H., Hemleben, C., and Kucera, M.: Distribution of Crenarchaeota tetraether membrane lipids in surface sediments from the Red Sea, Organ. Geochem., 40, 724–731,, 2009. 

Urey, H. C.: Oxygen isotopes in nature and in the laboratory, Science, 108, 489–496,, 1948.  

Urey, H. C., Lowenstam, H. A., Epstein, S., and McKinney, C. R.: Measurement of paleotempeatures and temperatures of the upper Cretaceous of England, Denmark, and the southeastern United States, B. Geol. Soc. Am., 62, 399–416,[399:MOPATO]2.0.CO;2, 1951. 

Van Fraassen, B. C.: Scientific representation: Paradoxes of perspective, Analysis, 70, 511–514,, 2010. 

Weijers, J. W. H., Schouten, S., Spaargaren, O. C., and Sinninghe Damsté, J. S.: Occurrence and distribution of tetraether membrane lipids in soils: Implications for the use of the TEX86 proxy and the BIT index, Org. Geochem., 37, 1680–1693,, 2006. 

Wilf, P., Wing, S. L., Greenwood, D. R., and Greenwood, C. L.: Using fossil leaves as paleoprecipitaiton indicators; an Eocene example, Geology, 26, 203–206,<0203:UFLAPI>2.3.CO;2, 1998. 

Wilson, J. and Boudinot, F. G.: The reliability of proxy measurements and the proxy/non-proxy distinction, American Geophysical Union Fall Meeting, abstract ID PP14B-07, San Francisco, California, 9–13 December 2019. 

Wilson, J. and Boudinot, F. G.: Proxy measurement in paleoclimatology and the TEX86 paleothermometer, in review, 2020. 

Woelders, L., Vellekoop, J., Kroon, D., Smit, J., Casadio, S., Pramparo, M. B., Dinares-Turell, J., Peterse, F., Slujis, A., Lenaerts, J. T. M., and Speijer, R. P.: Latest Cretaceous climatic and environmental change in the South Atlantic region, Paleoceanography, 32, 466–483,, 2017. 

Yasuhara, M., Rabalais, N. N., Conley, D. J., and Gutierrez, D.: Palaeo-records of histories of deoxygenation and its ecosystem impact, in: Ocean deoxygenation: Everyone's problem – Causes, impacts, consequences, and solutions, edited by: Laffoley, D. and Baxter, J. M., IUCN, Gland, Switzerland, 213–224,, 2019. 

Yevjevich, V.: Stochastic models in hydrology, Stoch. Hydrol. Hydraul., 1, 17–36,, 1987. 

Short summary
We propose a framework that can guide the development, application, and interpretation of different forms of measurement. Using temperature measurements as a case study, we demonstrate the need for a distinction between different measurements based on their different uncertainties and provide prescriptive language and measures that can aid a researcher's ability to assess data from different measurements. This framework can apply to a wide range of proxy measurements and fields of science.