Plant wax

Plant wax

Despite the shared odd-over-even chain length predominance, the absolute
amounts and relative abundances of

Stable carbon isotopes of chain-length-specific

Traditional interpretations of

The proposed approach is achieved in a Bayesian hierarchical modeling
framework (Fig. 1), which can leverage information from multiple proxies
to provide a robust statistical basis for proxy integration. The
hierarchical model is then inverted using Markov Chain Monte Carlo (MCMC)
methods (Geman and Geman, 1984) to obtain posterior parameter
estimates that are conditioned simultaneously on all proxy data. Similar
modeling approaches have been applied to meta-analyses of
paleoclimatic and vegetation proxies (e.g., Garreta et al., 2010; Li et al.,
2010; Tingley et al., 2012; Bowen et al., 2020) but have not been
specifically proposed for proxy interpretation of

Proposed model structure of the Bayesian hierarchical framework
for interpretation of

Our understanding of species-level

The

Similarly,

Calculation of the variance–covariance matrices does not allow for missing
values in the empirical dataset. Therefore, data entries with any missing
value are removed before calculation of the variance–covariance matrices.
The prior distribution parameters allow random samples to be drawn from the
prior distributions and used in the process model calculation (Step 1 in
Fig. 1). By using prior distributions derived entirely from modern plant

The model consists of a generic mixing process with multiple sources,
following the principle of isotope mass balance. First, a fixed number of
random draws (

Second, we calculate the mixture of alkanes from all groups as a function of
the fractional leaf mass contribution (FLMC), where

Third, the weighted average relative abundance (RA) of each

Lastly, for chain

The process model specifies the numerical relationships between random
samples from the prior distributions and the simulated metrics of interest
(

All proxy data are subject to errors that are associated with the proxy
observations themselves (Evans et al., 2013).
Therefore,

Because the uncertainty of the measured

In the model inversion step (Steps 3 to 5 in Fig. 1), MCMC is used to
propose samples of all model parameters conditioned on the measured

The model structure described above is coded in the BUGS (Bayesian inference
Using Gibbs Sampling) language (Lunn et al., 2012), and implemented in R
version 4.0.5 (R Core Team, 2021), using the “rjags” package with the
standalone JAGS (Just Another Gibbs Sampler) encoder installed separately
(Plummer, 2021). Three chains are run in parallel, and the number of
iterations is set at 800 000 to ensure model convergence, with the first
200 000 interactions as burn-ins. Chain thinning is set at once per 240
iterations. Convergence is assessed visually via trace plots (R
package “mcmcplots”, Curtis, 2018) and with reference to the convergence
factor “rhat” (Gelman and Rubin, 1992) and effective sample sizes
reported by the “rjags” package. The iteration parameters are chosen to
ensure complete convergence with rhat values smaller than 1.01. The average run
time for a three-chain

Once the model inversion is completed for each model parameter, its
posterior density is summarized via the kernel density estimation function
“density” in the R package “stats” with default settings. For selected
parameters, posterior density summaries such as the maximum a posteriori
estimation (MAPE), median density estimation, and the 89 % highest density interval (HDI) are reported using functions in the R package “bayestestR” (Makowski et al., 2019). Out of all the parameters, the
posterior densities of two parameters are of special interest in the mixing
process. One of these parameters is

Here we provide three case studies that demonstrate model characteristics
and offer alternative interpretations of previously published

Lacustrine sediments can incorporate

The

To demonstrate how our model can provide quantitative estimates of

The region of interest is the Qinghai–Tibetan Plateau in western China,
where both freshwater and saline lakes are abundant. The land cover of the
region is dominated by alpine meadow, steppe, and shrubland, which consist of
almost exclusively C

Lake surface sediment samples used in case study 1, with measured
chain-specific

Empirical data of per sample

Per-sample

Biome composition and its change over time have profound implications on
climatic shifts and the evolution of the biosphere. The tropical grassland
biome has been of particular interest because of its unique

To demonstrate how the model can provide quantitative estimates of

The region of interest is Sub-Saharan Africa where rainfall amount and
seasonality are the primary determinants of biome types (Sankaran et al.,
2005; Aleman et al., 2020). In western Africa, vegetation cover is dominated
by rainforest close to the Equator, wooded grassland to the north, and a
transitional zone in between (Huang et al., 2000; Rommerskirchen et al.,
2003; Garcin et al., 2012, 2014; Schwab et al., 2015). In
particular, tropical forest and savanna woody plant species have been shown
to co-occur in this region (Aleman et al., 2020), making it an ideal
place to investigate the potential of using

Lake surface sediment samples used in case study 2, with measured
chain-specific

Empirical data of per-sample

Per sample

Hydrogen isotope ratios (

To demonstrate how the reconstructed biome compositions in Sect. 2.2.2 can
be used to further assist interpretation of

To estimate

Solving Eq. (10),

Similar to Eq. (2), we expect

The process model for

The measured

In the model inversion step, MCMC is used to propose samples of all model
parameters conditioned on the measured

The sedimentary record of interest (GIK16160-3) is located off the Zambezi
River mouth, which has a temporal range of ca. 0.1–36.7 ka BP (Wang et
al., 2013a, b). The dominant biomes of the catchment area
today are the Zambezian woodland savanna and grassland savanna, with coastal
forest and Afromontane biomes also present (White, 1983; Dupont et al.,
2011; Wang et al., 2013a; Dupont and Kuhlmann, 2017). Previous studies have
interpreted the sedimentary record to reflect hydrological and vegetation
changes in the catchment area of the Zambezi River, as well as patterns of
sediment transport from the catchment vs. regions north of the river mouth
(Wang et al., 2013a; Khon et al., 2014; van der Lubbe et al., 2014;
Kasper et al., 2015; van der Lubbe et al., 2016; Lattaud et al., 2017). The
published

For the prior distributions of

The model results are to some degree sensitive to the prior parameter
estimates, which are derived from empirical data but imperfectly known in
our case studies. Here we use data associated with case study 2 to explore
the influence of prior parameter estimates on model output. To produce a
different set of prior parameter estimates, plant samples are selected from
western Africa only, as a subset of the sub-Saharan empirical dataset (123
entries out of the 301 entries from the original dataset, supplementary
data EA-4, Yang, 2022). Prior parameters are estimated from this dataset
using the same methods as described in Sect. 2.2.1 (“Data compilation”). The resultant prior
distributions differ from those of the sub-Saharan dataset primarily in the
C

Comparisons of prior distributions based on wester African plant
samples (thick red curves) vs. sub-Saharan African plant samples (thin blue
curves). The three left-most columns illustrate

For each of the three data points of the Lake Qinghai case study (Fig. 5a–c), the posterior densities of FLMCs of terrestrial plants and aquatic macrophytes vary substantially between the samples (Fig. 5d–f), while the distributions of the algal FLMC are almost the same between the samples. FLMC from aquatic macrophytes is the highest in the high-

The maximum a posteriori probability estimates (MAPE), the medians, and the 89 % highest density intervals (HDI) of posterior densities of the mixing fractions as model output using data in published lake surface sediment samples from Lake Qinghai (Fig. 5).

Bivariate density plots of the posterior densities of fractional leaf mass contribution (FLMC) of terrestrial plants, aquatic macrophytes, and algae in published lake surface sediment samples from Lake Qinghai (Liu et al., 2015).

Although the algal FLMC is not well constrained (Fig. 5), a strong
“trade-off” correlation (the increase of one correlates with the decrease
of another) between algal FLMC and those of the other two sources is
apparent in bivariate density plots (Fig. 6). The high-

Posterior densities of fractional source contribution (FSC

The posterior densities of the algal FSC

For the western African transect case study (Fig. 8a–c), the posterior
densities for fractional leaf mass contribution of C

The maximum a posteriori probability estimates (MAPE), the medians, and the 89 % highest density intervals (HDI) of posterior densities of the fractional leaf mass contributions (FLMCs) as model output using data in published lake surface sediment samples from Cameroon (Fig. 8).

Bivariate density plots show relatively weak “trade-off” correlation
patterns between the FLMCs of savanna and rainforest C

Bivariate density plots of the posterior densities of fractional
leaf mass contribution (FLMC) of tropical C

Posterior densities of fractional source contribution (FSC

The distributions of FSC

The MAPEs of rainforest C

The MAPEs of MAP

Using the western Africa prior dataset (Fig. 4), with lower

Comparisons of posterior densities of fractional leaf mass
contribution of the C

The model shows different sensitivity to chain length distribution and
carbon stable isotopes. It is relatively insensitive to chain length
distribution: only minor changes in the posterior densities are observed
when the likelihood evaluations of RA are removed (left column, Fig. 14).
By contrast, the model is much more sensitive to the

Model sensitivity to proxy type in model inversion (Eqs. 6
and 8), using case study 2 as an example. The right column (No RA) shows
the model output with the likelihood evaluations of RA completely removed
from model evaluation. The left column (No

Long chain

The relatively unconstrained FLMC of the algae source (Fig. 5) and its
“trade-off” with that of the terrestrial source (Fig. 6) suggest that
the possibility of algae being an important biomass source of the lake
surface sediment cannot be eliminated, and would impact the interpreted
contributions from other sources. This possibility is consistent with the
observations that the deep-water lake bottom is covered mainly by green
algae in Lake Qinghai (Liu et al., 2015). The model approach
successfully identified such a possibility, despite the consistently low
FSC

The Tibetan Plateau has been argued to be an ideal region to investigate the
input from aquatic macrophytes to the organic matter and

This case study demonstrates that the proposed framework can leverage chain
length distribution and

FLMC as a metric for vegetation reconstruction is not directly comparable
with

The model recovered a prominent shift in the vegetation source of the

Prior to 15 ka BP, the contribution of Zambezi River sediments to the core
location was relatively stable (Fig. 11d, e). The transient rise in
rainforest FLMC during the deglaciation is likely associated with alkanes
sourcing from near-coastal forests. At the peak of rainforest FLMC during
HS1, the estimated MAP

The results of sensitivity test 1 (Fig. 13) show that using a different
set of prior distributions (Fig. 4) can produce somewhat different central
tendencies in FLMCs with the same sedimentary

The results of sensitivity test 2 (Fig. 14) show that

The proposed model framework offers a more integrative approach to
interpretation of

First, the model offers a numerical solution (Sect. 2.1.2) that accounts
for the uncertainty associated with

Second, the case studies demonstrate that the model can be used to explore
mixing regimes of multiple sources, which offers alternative interpretations
compared to the traditional two-end-member mixing regime using

Moreover, the new metric FLMC has the potential to evaluate leaf mass
integration patterns in sedimentary archives. The molecular distribution and

Lastly, by leveraging compound-specific stable isotope data from multiple

The proposed mixing model is the bare core of a potentially more comprehensive proxy system model, an approach that has gained traction in recent developments of paleoenvironmental reconstruction (e.g., Garreta et al., 2010; Li et al., 2010; Tingley et al., 2012; Evans et al., 2013; Dee et al., 2018; Konecky et al., 2019; Bowen et al., 2020). A proxy system model is a representation of the complete proxy system that ideally includes four components: the environment, the sensor, the archive, and the observation (Evans et al., 2013). The mixing model here primarily describes the sensor and observation components of the complete proxy system, while the environment and the archive components have not been incorporated (see Dee et al., 2015, 2018; Konecky et al., 2019 as examples for other model components). Future efforts that elaborate on the model structure will provide updated model assumptions, which should be based on systematic investigations of the specific proxy system components. Here are three categories of model improvements to consider.

A better characterization of

Processes such as

Environmental factors, such as temperature and precipitation, are primary
determinants of vegetation composition, but they also influence

Traditional interpretations of

All data and code used to conduct the analyses and create figures reported in this paper are archived online and available at

The supplement related to this article is available online at:

DY conceived, designed, and conducted the analyses with support from GJB. DY prepared the manuscript with contributions from GJB.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We would like to thank Jamie McFarlin, Kevin Uno, and Brenden Fischer-Femal for their comments and discussions about the model development.

This research has been supported by the NSF Division of Biological Infrastructure (grant no. ABI-1759730).

This paper was edited by Julie Loisel and reviewed by two anonymous referees.