Autoregressive Statistical Modeling of a Peru Margin Multi-proxy Holocene Record Shows Correlation Not Causation, Flickering Regimes and Persistence

Correlation does not necessarily imply a causation, but in climatology and paleoclimatology, correlation is used to identify potential cause-and-effect relationships because linking mechanisms are difficult to observe. Confounding by an often unknown outside variable that drives the sets of observables is one of the major factors that lead to correlations that are not the result of causation. Here we show how autoregressive (AR) models can be used to examine lead-lag relationships—helpful in assessing cause and effect—of paleoclimate variables while addressing two other challenges that are often encountered in paleoclimate data: unevenly spaced data and switching between regimes at unknown times. Specifically, we analyze multiple paleoclimate proxies, sea surface temperature (SST), C37, δ15N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \delta^{15} N $$\end{document}, and %N from the central Peru margin to find their correlations and changes in their variability over the Holocene epoch. The four proxies are sampled at high-resolution from the same core but are not synchronously sampled at all possible locations. The multidimensional records are treated as evenly spaced data with missing values, and the missing values are filled by the Kalman filter expected values. We employ hidden Markov models (HMM) and autoregressive HMM (AR-HMM) to address the potential that the degree of variability and the correlations between these proxies change over time. The HMM, which is not autoregressive, shows instantaneous correlations between observables in two regimes. However, our investigation of lead-lag relationships using the AR-HMM shows that the cross-correlations do not indicate a causal link. Each of the four proxies has predictability on decadal timescales, but none of the proxies is a good predictor of any other proxy, so we hypothesize that a common unobserved variable—or a set of variables—is driving the instantaneous relationships among these four proxies. These findings suggest that the variability at this site is remotely driven by processes such as those causing the Pacific Decadal Oscillation (PDO), rather than locally driven by processes such as upwelling influences on temperature and vertical mixing of nutrients.


Introduction
Paleoclimatic time series hold the promise to extend our knowledge of oceanic, atmospheric, and ecological variability on the timescale of decades to centuries, a time window poorly constrained by instrumental observations. A frequent assumption of such studies is that significant modes of variability detected in the historical record (El Nino/Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), Atlantic Meridional Oscillation (AMO), etc.) persist into the past and there may also exist other fluctuations detectable only through the paleoclimate record, but that resemble modern patterns [18,19,27]. Such patterns often involve an ensemble of couplings between aspects such as pressure gradients, surface temperature, and biological productivity-all of which might be observed by sediment proxy methods-and couplings or correlations might be used to infer that the underlying variability does in fact resemble documented modes of modern internal climate variability.
This paper examines statistical aspects of a long-duration, high-resolution, multidimensional time series that records variations among sea surface temperature (SST), phytoplankton productivity, and intensity of the oxygen minimum zone (OMZ) over the Holocene epoch (0.60-9.44 ka BP) along the central Peru margin (Site MW8708-PC2: 15.1°S, 75.7°W, water depth of 250 m), a region strongly affected by the modern ENSO and PDO cycles [3,27]. The sediment is sampled at high-resolution to amount to roughly 3year averages sampled every~8 years under the accumulation rate typical of the region, and the different proxies are analyzed using the same core sampling so that correlations are robust under age model uncertainties. These records indicate both surface and subsurface variability in the physical and biological state. Simple correlation analyses revealed that proxy associations vary over time; in some intervals following expected ENSO-like correlations of low sea surface temperature (SST) and enhanced biological productivity; in other intervals this correlation is not apparent [4]. We develop here a statistical framework to analyze the evolving relationships over the Holocene in order to interpret proxy correlations, characteristic timescales of variability ("regimes"), and predictability.
Some of the key questions that may be addressed by time series analysis in this region are whether the variability arises from a local or internal source, such as variation in physics through mixing or eddies at the surface [3,5] or changes in the biological makeup of ecosystems in the region (e.g., [10], or from a remote or external source, such as variations in the water properties arriving at the site through large scale modes such as El Nino or the PDO [6,23]. The site ( Fig. 1) is known for wind-driven upwelling [3] at depth shallower than 250 m and low oxygen concentrations at depth typical of the eastern tropical South Atlantic oxygen minimum zone, which has been highly variable near 250 m depth in recent times [33]. Thus, the local forcing hypothesis may be summarized as: wind-driven upwelling drives mixing which brings up low-temperature water and nutrients, thus increasing the metrics of productivity. Co-variability in SST and productivity reconstructions are often explained using a variation of the local hypothesis (e.g. [20,24,26,35]. As we have said, upwelling and mixing are observed to occur near this site, but long, continuous records spanning millennia (or even decades) may not have variations that are primarily a response to mixing variability. By contrast, a remote forcing hypothesis suggests that the primary variability in these metrics is related to the remote formation of watermasses with nutrients, temperature, and other markers of upstream productivity that are recorded at this site after arriving via an unknown (and climate variability sensitive) set of pathways. Despite the low oxygen levels at depth, the typical sediment accumulation rate over the Holocene during these samples is high Fig. 1 a Location of the site MW8708-PC2 (15.1°S, 75.7°W, water depth of 250 m), superimposed on the correlation of the SST gridpoint nearest that location with each SST gridpoint globally (using HadISST data [29]. b Time series of sea surface temperature with climatological 1900-1914 seasonal cycle removed (blue), 3-year running mean of this SST (red), PDO principle component time series [23,6] which has been rescaled to have the same variance as the SST (black). The red circles are exemplars of 3-year averages plotted every seventh year. c A schematic of the region, illustrating the proxies examined (SST ; C 37 ; δ 15 N ; %N) and local physical processes (wind-driven upwelling, thermocline, oxygen minimum zone) (Color figure online) (~78 cm/k year), which suggests high, sustained biological productivity and presumably a persistent level of oxygen demand.
A visual analysis of the proxy records ( Fig. 2) suggests that the variability of four proxies might fall into multiple regimes: one state with high variability and another state with low variability. This biphasic behavior guided our initial analysis using a Hidden Markov Model (HMM; Rabiner [30]. Hidden Markov methods are increasingly used as a statistically robust automated method for identifying climate regime shifts (e.g., [1,8,22]. A benefit of our approach is that it can objectively identify regimes of paleoclimatic behavior in which correlations between proxies and proxy variance evolve (and perhaps alternate) over time. We also explored the possibility of more than two states, but found that these extra regimes were visited only very rarely in comparison to two dominant states, so parsimony suggested retaining only two modes for this dataset.
A less common tool in climate modeling is the autoregressive hidden Markov method (AR-HMM, [12,13,14] which allows for some memory in the system through a dependence on previous proxy values as well as correlations in the present proxy value. The application of this method and the insights gained from this application, are a key breakthrough found in our analysis. Both our HMM and AR-HMM results show that there exist two regimes of variability in proxy space at site MW8708-PC2. Here the AR-HMM technique is used to probe deeper into distinctions between causality and correlation, under the premise that a predictive cause should precede its effect in time. As the HMM method examines only simultaneous-in-time correlations by using the variance-covariance matrix of HMM, it is not capable of distinguishing causation from correlation in this way. A surprising result of this study is that our conception of the relationships among these proxies from the HMM analysis changed dramatically when the AR-HMM technique was applied and contrasted to the more standard HMM approach. The AR-HMM shows that both climate regimes show high auto-correlation and low cross-correlation, thereby indicating that none of the proxies are good predictors of other proxies on interannual to decadal timescales. Thus, a hypothesis of local causality (in the Granger sense) between the variables, such as mixing driving local productivity, is not supported by the AR-HMM analysis. This lack of Granger causality is robust to the biphasic regime shifts as well, although in a different location where regime change is not present, a simpler autoregressive only approach can be used to assess causation versus correlation following a similar approach to the methods used here. The software provided with this paper (https://github.com/seonminahn/ARHMM) can be applied for the analysis of multi proxy data from a core record.

Context from Modern Observations
To better understand what processes would affect the variability on the timescales that are sampled, a brief analysis of the region and related climate indices was carried out. The location is somewhat south of the region most active during the El Nino/Southern Oscillation (ENSO) cycles (i.e., south of NINO1 [31]. On longer timescales, a meridionally broader, yet similarly shaped pattern of variability has come to be known as the PDO [6]. Figure 1a shows the location of the MW8708-PC2 sediment core that is analyzed for this study. The location is superimposed on a map of the correlation of global sea surface temperature with the nearest HadISST data point (15.5S, 75.5 W). Warming in this region correlates well with warming along the central and eastern equatorial Pacific, cooling over most of the extratropical Pacific, and weakly correlates with temperatures in other basins. The correlated pattern resembles both the El Nino pattern and the PDO pattern [6]. In time, the sediment record (indicated by circles in Fig. 1b) is too infrequent to capture El Nino variability, generally taken to dominate the 2-7 year band. While some of the biggest El Nino events (1982)(1983)(1997)(1998) are still visible in the filtered data, it is evident that time filtering similar to our sediment sampling has removed most of the high-frequency ENSO variability.
Deser et al. [6] follow [Ref. 23] in tracking the PDO using the first empirical orthogonal function (EOF) and principal component (PC) of North Pacific sea surface temperature (20N to 70N, 100E to 100W) after the removal of seasonal and global mean variability. Its variability is taken as a PDO index. This index captures much of the low frequency (> 7 year) variability near the sediment location (Fig. 1b), even though the core location is remote from all data included in the PDO index and the monthly index and core location SST have a correlation coefficient of only 0.16. Taking running means over 2-22-years averaging windows of SST at our site correlates well with the PDO index: among this set of means all have correlation coefficients above 0.53, and the peak coefficient is just above 0.6 (for 7-year averages). Thus, we interpret the dominant mode of variability accurately sampled by the core measurements to be associated the PDO. Note that the record described here is significantly longer than extant records of the PDO (e.g., AD 993-1996 tree-ring compilation by ref. [25].
A variety of mechanisms have been used to explain the PDO. Alexander [2] reviews the mechanisms and concludes that a variety of causes are consistent with the observations, mainly heat flux and wind variability, including El Nino variability communicated to the N. Pacific by the "atmospheric bridge". This variability is modulated toward lower frequencies by the reddening of "stochastic" variability [15] by the large heat capacity of the mixed layer [7], but also through slow-response phenomena such as the re-emergence of sub-boundarylayer temperature anomalies during subsequent winters and the slow propagation of baroclinic Rossby waves. The autoregressive formulation of the AR-HMM is essentially the same as the stochastic model used by Hasselmann. According to [7], forcing amplitude affects response amplitude, but the damping rate of variability affects both the magnitude of variability and the persistence timescale, with greater magnitude and longer persistence indicating weaker damping which is a consequence of a shallower mixed layer and reduced heat capacity.

Data Collection
High-resolution records of four paleoclimate indicators are collectively analyzed for a sediment core retrieved from the central Peru margin (Site MW8708-PC2: 15.1°S, 75.7°W, water depth of 250 m, Fig. 1). This site has an extremely high and steady sedimentation rate (~78 cm/k year) across most of the Holocene (10-1.4 ka), and frequently contains annual laminations. Records are obtained from 2 cm (~3 years) slices taken every 5 cm (~8 years). The age model determined the core top to be located at~600 years before present (year BP) (gravity coring typically disturbs the upper few decimeters of sedimentation), and the base of the record to lie at~9440 year BP. The very gentle curvature in estimated sediment accumulation rates [4] will be ignored in this study, so depth is proportional to age and time steps are uniform.
The four records examined are proxies for sea surface temperature (SST ) through the alkenone proxy (a comparison of the relative abundance of double bonds among chains of 37 carbon atoms: Uk'37), biological productivity of a specific phytoplankton group (which is thought to produce all occurrences of C 37 regardless of the number of double bonds) through analyses of the abundance of alkenones (with a greater quantity representing greater haptophyte algal productivity). Thus, productivity (C 37 ) relies on a different combination of alkenones than the one interpreted as paleotemperature by counting the relative numbers of double bonds in each molecule (Uk '37). Subsurface properties are determined through analyses of δ 15 N , an index of subsurface oxygenation and denitrification, and the percentage of organic nitrogen (%N) which is a composite of all biological inputs to the sediment. We interpret the alkenone Uk'37 index as an approximation to mean annual sea surface temperature. Although anomalous Uk'37 values have been reported in the region [17,28], there is no convincing evidence for seasonal bias based on analyses of modern sediments over a broad region of the Eastern Equatorial Pacific with very strong gradients in the timing of maximum annual biological production [17,34]. Analyses of modern sediments in the region conducted at the Brown University laboratory show agreement with mean annual temperatures in the region of our core study to within the standard empirical proxy calibration (e.g., versus a subset of data reported in [17]. Our paleo-productivity interpretations are guided by the presence of a proxy that responds to total phytoplankton production (%N) and to a subset of the haptophyte production (C 37 total); we can therefore assess whether alkenone production is coupled or decoupled to a generalized biological response over time.

Missing Data
The four proxies are measured in high-resolution with fairly uniform depth sampling (2 cm about every 5 cm), but different proxies are not sampled at all possible locations. In order to compose an evenly-spaced data set that will be used to train discrete-time statistical models described below, the expected values in an evenly-spaced record are used to fill in the records using a Kalman filter [21,36]. The Kalman filter finds the expected values of the missing data given the observed values, and we find the maximum likelihood estimates of the model parameters by using the expectation-maximization algorithm. The time window 9.44-0.60 ka BP is discretized into 1127 discrete steps (each of which represents 5 cm/~8 year, or approximately the width of an analysis slice). Each proxy is then allocated to the nearest location in depth/age. Not every possible slice was analyzed: there are 526 SST; 526 C 37 ; 727 δ 15 N ; and 728 %N measurements out of 1127 possible to fill all time-steps.
Each observation X (t) is interpreted as a four-component vector.
In the climate and data assimilation literature, this vector is usually called the "state" vector; here it will be called the observation vector to distinguish it from the regime or "state" of the hidden Markov model. After arranging the data in this manner, the expected values estimated using a Kalman filter are used to fill in missing data (Figs. 3, 4). The mean and standard deviation of each proxy variable have been removed as a preprocessing step so that the different units of each measurement are not a factor and the Kalman filter likewise does not depend on the units of measurement. Our preparation of this discrete-time technique and the discrete-time statistical models below assume that even spacing in depth is sufficiently uniform in time, i.e., variations in the age-depth relationship were not considered in this imputing technique.

Statistical Models: HMM and AR-HMM
Statistical modeling involves developing relationships between one set of predictor variables and another set of predictands. Paleoclimate reconstructions likewise develop models to relate proxy information (predictors) to past climate variables (predictands). Thus, statistical modeling and paleoclimate reconstructions both seek the same goals, and approaches of varying complexity are found to infill missing data or to understand relationships among variables.
In paleoclimate studies, as in any set of observations, not all important variables can be observed or reconstructed. It is typical in such situations to hypothesize linkages among observed variables, but a more direct observation of the mechanism involved in the linkage is not recorded. So, one might expect that A causes B that causes C, but only A and C are observed. Statistical modeling can help identify or quantitatively assess relationships between A and C, even in the presence of hidden variables such as B.
The autoregressive (AR) approach adds value by allowing the state at previous times to be among the predictors of the present state predictands. Typically, causes precede effects, so the AR approach allows for an interpretation of causality following [11]-if a predictor precedes the predictand in time, then it is the cause rather than vice versa. Simultaneous correlations among variables are frequently interpreted as implying causality, but they can represent a number of relationships-cause and effect, effect and cause, or accidental correlations without causal relationships. The greater precision of the AR models allows for examination of causal relationships under the assumption of cause preceding effects. It is important to note that stronger forms of causality can be observed, e.g., observing steps along the way to a particular mechanism linking forcing and cause, but such records are rarely preserved in paleoproxies, so the weaker [11] definition of causality by failure to improve predictions with added information is useful.
Furthermore, the HMM provides a quantitative justification of transitions between different epochs governed by regime shifts in the surrounding climate. Even though these shifts might not be directly detectable in any of the recorded variables alone, the HMM provides a technique that allows all variables to contribute equally in identifying shifts in the relationships among the variables.
The degree of variability in correlation among these proxies appeared to change at unknown times over this epoch. Visual analysis suggests that the correlations and variability of the four proxies varied over time in a potentially abrupt manner (Figs. 2, 3, 4). Indeed, use of a two-state (a.k.a. two-regime) hidden Markov models (HMM) and a generalization of this approach, autoregressive HMM (AR-HMM), do detect two distinct states at this site, characterized by different levels of variability and predictability. Experimentation with higher numbers of states revealed that two states were sufficient for this record.
Both HMM and AR-HMM consist of observed data X (t) and two kinds of hidden states s (t). The measured data from the sediment core correspond to X (t), and the unobserved state for each observed data corresponds to s(t). The unobserved hidden states are analogous to the terms "regimes" that are described in climate studies. The states are hidden because they are to be determined from relationships within the data by the model, rather than indicated directly, e.g., if the value of one variable indicated which regime the data was in at any given time. Figure 5 illustrates the dependencies among hidden states (S) and observed data (X) of the two models. State dependencies are the same in both models. Both models have hidden states that have the Markov property, meaning that the future state does not depend on the past states given the present states. The difference between the two models is the dependency  X (2)). In the HMM, a current observation is solely dependent on present observations and the current state. The HMM assumes that a current observation follows the normal distribution with means and variances determined by its state. Thus, a current set of observations is independent of other sets of observations at other times, although its state does depend on what state was determined at a previous time. In the AR-HMM, a current observation depends not only on a current and previous state but also on the previous observations. Therefore, the AR-HMM model allows for examination of causal relationships among the observed variables-inferring connections beyond just "regime" shifts and into relationships such as SST predicts productivity at later time.
The two models, HMM and AR-HMM, can be formulated using the same equations because the HMM is a special case of the AR-HMM-it has the same equations but neglects linkages between different times. Both models are based on two-state hidden Markov models, but they use two different emission (time-correlation or memory) models. The HMM assumes conditional independence among observations given the state, regime, and the AR-HMM considers direct dependence with adjacent observations (i.e., memory), which is also known as a switching autoregressive model [12,13,14].
Both models have hidden regimes or "states" in which it is assumed that the historical dependence of the current hidden (unobserved) state is entirely accounted for by the state of its immediate proceeding neighbor and a transition probability, i.e., the state-switching process is Markovian. The matrix of state transition probabilities is The difference between the AR-HMM and HMM models is the relationship between the observations at different times. The equation for the AR-HMM can be written This equation represents the two-state AR-HMM, where the state value s(t) can be either 1 (Noisy state) or 2 (Calm state). There are two constant vectors c s(t) depending on the state at time t. Likewise, there are two matrices θ s(t) that prescribes the deterministic part of the model evolution based on observations at a previous time, depending on the state. The noise vector (ε(t)) follows a normal distribution with zero mean and covariance Σ s(t) that contain all the information about stochastic variances and covariance of the observations. Note that the matrices θ s(t) are inherently asymmetric since they describe the lead and lag between each of the variables while the covariance-variance matrices Σ s(t) are symmetric by their definition. The (i,j) entry Σ i j of the covariance matrix is defined as E (ε i − E[ε i ]) ε j − E ε j , so Σ i j equals to Σ ji and the matrix is symmetric [37].
The HMM can be written the same way as (2), but removing the deterministic dependence of current observations on previous observations (θ 0). The HMM assumes that each observation follows a multivariate normal distribution with means (c), (stochastic) variances (Σ ii ), and (stochastic) covariances (Σ i j ) determined only by present value of the hidden state. The covariance-variance matrix can be used for the correlation analysis.
In both the HMM and AR-HMM models, the unknown parameters including constant parameters in each two-state model are estimated by the Baum-Welch expectation maximization algorithm (EM; [30]).

Parameter Estimation Results
The parameter estimations are done using the EM algorithm for both HMM and AR-HMM. The EM algorithm updates parameters iteratively using the forward and backward sampling algorithm. The data augmentation step that uses the Kalman filter is added at the beginning of each iteration to address missing data. Depending on initial conditions of the EM algorithm, it is possible that the EM algorithm converges to local maximum estimators instead of global maximum estimators. To avoid local maxima, parameter estimations are repeated with 100 different initial conditions and the selected parameters are those that achieve maximum likelihood from this set.

HMM Parameters
For a two-state HMM after removal of the overall mean and normalization of the standard deviation of each proxy, there are five unknown parameters which have 30°of freedom in total: the transition matrix a, and one version of c and Σ for each state, as described in the Eqs. (1) and (2). Table 1 shows the results of the parameter estimations. The two states are distinctively different in means and covariance. The mean of each proxy differs in sign between the two states, which must be the case as the overall mean of each proxy has been removed. However, the pattern of means among the proxies, e.g., high SST and low C 37 , is a signature of each state. The absolute values of the components and eigenvalues of Σ are larger in state 1 than in state 2. The eigenvalues (the strength of correlated noise components) of Σ are 2.21, 0.98, 0.57, and 0.18 for state 1 and 0.82, 0.58, 0.20, and 0.13 for state 2. Thus, we can associate state 1 as a "noisy" state and state 2 as a "calm" state, because the proxies tend to fluctuate more when in state 1 than in state 2 as indicated by the larger eigenvalue magnitudes in state 1. In terms of transition probability, the diagonal elements of a are close to 1, which implies that there is a high probability of staying in a state. Table 3 shows that 7 of the 12 correlation coefficients are approximately 0.5 or higher where the correlation coefficient of two variables is defined as a covariance divided by the multiplication of the standard deviations.
According to the parameter estimations, the most probable state is determined at each time using the backward sampling (Fig. 3). The median (mean) time to remain in HMM state 1 over 1000 samples is 78.5 years (144.2 years). The median time to remain in state 2 over 1000 samples is 102.1 years (212.9 years).

AR-HMM Parameters
For a two-state AR-HMM after removal of the overall mean and normalization of the standard deviation of each proxy, there are seven unknown parameters which have 62 degrees of freedom in total: the transition matrix a, and one version of c, θ , and Σ for each state, as described in the Eqs. (1) and (2). The estimated parameters are shown in Table 2, and the model state and imputed values are shown in Fig. 4. Again, state 1 can be identified as the "noisy" state and state 2 is "calm". In terms of transition probability, the diagonal components of a are around 0.8, which are smaller than those of the HMM. Thus, there are more frequent state changes in Fig. 4 than shown by the HMM (Fig. 3). The median (mean) time to remain in state 1 over 1000 samples is 7.9 years (20.1 years). The median time to remain in state 2 over 1000 samples is 31.4 years (44.6 years). The diagonal entries of θ are close to 1 on both states: each variable of state 2 depends strongly on its own past value. The diagonal entries of θ for state 1 are smaller than that of state 2 with both greater than 0.85 for all four proxies. The off-diagonal entries are all smaller than 0.07 for both matrices. Thus, only a small part of the dependence of each variable on its past value can be attributed to cross-correlations rather than autocorrelations. The antisymmetric components of θ are much smaller than the diagonal components, so the "probability angular momentum" which lends covariant predictability [38,39] is not significant.
The diagonal entries of Σ in the AR-HMM are much smaller than they were in the HMM-so that variability attributed to noise within each variable is considerably lessened by the introduction of memory. The eigenvalues of the Σ matrix as well are roughly a factor of 5-50 smaller, indicating that the covariant modes of noise are estimated to be much weaker when the memory of the AR-HMM system is permitted.
The mean state c of the HMM and AR-HMM do not resemble one another in their patterns, magnitudes or signs. Thus, while these patterns are a characteristic of the HMM and AR-HMM states, there is no agreement between the pairs of states in mean, timing of onset, or cross-correlations.

Comparison of Models
The HMM is a special case of the AR-HMM. The AR-HMM with zero θ s(t) is identical to the HMM. So, if the AR-HMM results in the proxies having weak autocorrelation, θ s(t) should be close to zero, and the other parameters of the AR-HMM (the noise covariance matrices (Σ s(t) )) will resemble their equivalents in the HMM. Thus, if the HMM was an adequate model to describe the proxy data, then allowing the extra degrees of freedom in the AR-HMM would result in little extra predictive power, and this result would not change the interpretation of the data from the interpretation found using the HMM alone. However, in this particular dataset, the AR-HMM resulted in extremely large auto-correlation relationships (the entries of the estimated θ s(t) are close to one) and furthermore the other model parameters (the estimated noise covariance matrices) are quite different between the HMM and the AR-HMM. Figure 6 visualizes and compares the estimated θ s(t) of the HMM and AR-HMM. The fact that the AR-HMM coefficients do not resemble the HMM in pattern, magnitude, or implied relationships means that a dependence of the data on values at a previous time is a critical aspect of the data. Thus, a key conclusion from the statistical models is that the past values of each proxy predict its own proxy variability better than the different proxy-to-proxy cross-correlations at the same time (or indeed the cross-correlations among past and present values). This fact implies that the different proxies in this particular dataset are not causally related to one another, as is often assumed in multi-proxy paleoclimate analyses (e.g., [9]. This result probably does not apply to all multi-proxy records; indeed many are probably causally linked, but our methodology for testing that assumption by comparing HMM to AR-HMM is generic. Thus, in this location, the four proxies (SST ; C 37 ; δ 15 N ; %N) are not related to each other in the local sense that variability in any one dominates or contributes significantly to variability in another through a local physical or biological mechanism.
A caveat arises in assessing variance in the time series: changes in the extent of laminations down-core, which could introduce differential smoothing of the results. We can assess the possible influence of lamination versus bioturbation in two ways: a visual comparison of X-radiographs of the core, which show the presence/absence of laminations, and comparison to δ 15 N , which is strongly indicative of lamination (high δ 15 N signifies intense depletion of oxygen in the subsurface). The results of the HMM and AR-HMM differ significantly in this regard. The presence of State 1 versus State 2 correlates strongly with the degree of lamination/δ 15 N proxy in the case the HMM (the "noisy" state occurring much more frequently in laminated intervals). This association is confirmed by the significant offset in the mean values of δ 15 N for State 1 and 2 (Table 1). However, the AR-HMM removes any significant dependence on the occurrence of the "noisy" versus "calm" states on the status of lamination down-core, and is confirmed by the negligible offset in the mean δ 15 N reported for the two states (Table 2).  Tables 1 and 4. While both AR-HMM and HMM attribute a noisy state and a calm state to the time series, none of the means, variances, or timings of onset of these states agree. Furthermore, it was noted that the HMM mean states must be opposite in sign in order for the normalized time series to be zero. The AR-HMM is not constrained by this limit, as the predictions of θ can contribute to the mean. Because the AR-HMM is more general than the HMM, disagreement between these state identifications indicates that the autoregression memory of the AR-HMM is important. Bolstering this idea is the fact that the dominant modes of correlation of observations with the previous time observations are autocorrelations, i.e., the dominant predictor of any of the four proxies is itself at a previous time and not interactions between the observed variables.     Tables 3, 4. These correlation matrices are calculated using each data set in which the missing parts have been imputed by their expected value and the state estimation at each time. The signs of correlations are usually the same between the two model assessments, but the strength of the cross-correlations vary somewhat. Note that the cross-correlations do not disappear in the AR-HMM. Even though the full model reveals the underlying autocorrelations, these simple single-time correlations are unable to detect any inconsistencies of correlations between variables and do not reveal causation between variables in this data.

Discussion
The preceding statistical model results may be related back to the original science questions that motivated this collection of data. That is, what changes in physics or biological makeup help better understand the mechanisms at play in setting the variability in this region?

Implications for Mechanisms
In the introduction, it was argued that potential local mechanisms might be used as causes to explain correlations and connections among these data. Variability in upwelling, stratification, biological makeup, oxygen utilization and productivity, and many other mechanisms would be likely to strengthen a particular set of cross-correlations and levels of variability among these data. Indeed, two different states, one noisy and one calm, were detected with both AR-HMM and HMM model parameter estimation. Tables 1, 2, 3, 4 show significant crosscorrelations and difference in cross-correlations and levels of variability between these two states. The typical HMM approach confirmed roughly these conclusions.
However, a closer examination of the dependences of the proxies on AR-HMM autocorrelations with their previous time values and cross-correlations with previous and synchronous values of other proxies reveals a very different story. This analysis revealed that the restrictions required to reduce the AR-HMM to the HMM, i.e., the neglect of memory of past observations, systematically corrupted interpretation of the system. The magnitude of the components and eigenvalues of the Σ matrix are significantly smaller in the AR-HMM than in the HMM. Thus, present observations are caused-in the ref. [11] sense-by the previous observations, i.e. the predictive rather than the intervention sense. The small off-diagonal terms in θ indicate that each proxy is not strongly caused by any other proxy, only by its own previous values. Rather, the apparent correlations found by the HMM model very likely stem from confounding (https://explorable.com/confounding-variables) by an unobserved mechanism that drives all four parameters in a coordinated manner. These results are inconsistent with any local mechanism that would link these proxies to one another causally, e.g., if SST variability were to indicate upwelling that drives productivity and thus C 37 and %N. Because both the past-time cross-correlations and the present-time correlated noise became less consistent in the AR-HMM when compared to the HMM, it is unlikely that this lack of cross-predictability is due to the limited temporal resolution. Consistent local mechanisms would require variability caused by unobserved mechanisms that might affect one or more of the proxies, so-called confounding variables. A variety of distinct remote causes for variability, e.g., SST driven by the PDO and other proxies driven by other climate modes or source variability, are a sufficient explanation for the results here.

Implications for Predictability
One interesting aspect of the AR-HMM model is that it reveals the dependence of the present observations on previous observations. This implies a sort of predictability of the four proxies based on the AR-HMM. However, because the predictability is essentially just autocorrelations, the AR-HMM does not predict significantly differently from persistence (same observations next time as this time). Nonetheless, some aspects of predictability in this system are of interest.
One difference between a prediction system and a reanalysis of past events is that a prediction system should use only the data that precedes the times that will be predicted. Two methods to achieve this were used here: (1) predict new parameters using the data sequence preceding the points we predict, and (2) sample values using these parameters.
Predictability of the AR-HMM was evaluated over the time window 5.7344-5.2634 ka BP. Figure 7 gives a sense of what behaviors these predictions tend toward in the window. This interval is chosen because the resolution of the interval is relatively higher than other intervals, and the AR-HMM state is persistent in the (calm) state 2 over this interval. Taking the observation at 5.2634 ka BP as an endpoint, the predictability of one-step (0.0079 ka) to thirty-step (0.2355 ka) is assessed. Each prediction is repeated 1000 times.
Depending on the most probable state of an initial point, the entries of the next step are computed with the emission model (Eq. 2) with parameters estimated in the previous section. The state of the next step is determined by the transition probability, and then the entries of the following step are computed with the Eq. (2) in the same way. State determination and entry computations are repeated until reaching the endpoint.
The accuracy of prediction based on the AR-HMM is examined using mean squared errors (MSE) as shown in Table 5. Predictions up to four-step, which corresponds to approximately three decades, achieve reduction of the MSE by 40-80%, depending on the proxy. The results The numbers in parenthesis represent the percentage over the longest prediction do not show a tight range of prediction when the length of prediction is longer than four steps (0.0314 ka) ahead. However, the probability of remaining in a given state or regime for the future steps can be predicted from the transition probability, typically for decades based on the AR-HMM transition probabilities. The noisiest proxies tend to have forecasts that revert to spanning their climatological range most quickly. The forecasts that begin in the noisy regime of state 1 tend to lose persistence faster as well.
In order to compare the HMM with the AR-HMM, we assessed the predictability of the HMM in the same manner as the that of AR-HMM. While the MSEs increase as the forecast length increases in AR-HMM predictions, the MSEs of HMM keep the same size regardless of prediction length. In a system with strong auto-correlations such as this one, useful forecasts require a memory of past states.

Conclusions
Multi-proxy records are a potentially powerful tool in strengthening understanding of paleorecords. However, depending on which variables are observed and where, they may or may not capture direct evidence of the mechanisms at work. This study was carefully designed to distinguish different types of local mechanisms that might be causing variability on the Peru margin over the Holocene. However, it is our interpretation of the estimates of statistical model parameters found that no local causal mechanisms were observed to be significant at the roughly decadal scale of sampling employed.
This study illustrates the importance of assessing predictive (Granger) causation in order to avoid spurious diagnoses of the mechanism through the use of autoregressive (AR) models for example. AR algorithms are widely available (in R and MATLAB) for cases not involving regime change. In addition as pointed out by Hu et al. [9], when multiple records are involved, age uncertainty can also lead to spurious associations.
Before closing, it is interesting to consider broadly the implications of the regimeswitching observed here. While it was shown that similar-sampling-frequency analyses of modern observations at this location reveal SST variability that is dominated by the PDO, past variability indicates a change in PDO variability at this site, transient appearance of other dominant modes, or changes in teleconnections [32] demonstrate that changes in such remote influences of climate variability are likely to be common even when the underlying climate mode is unchanging.