Recent advances in proxy-model data assimilation have made feasible
the development of proxy-based reanalyses. Proxy-based reanalyses
aim to make optimum use of both proxy and model data while
presenting paleoclimate information in an accessible format – they
will undoubtedly play a pivotal role in the future of paleoclimate
research. In the Paleoclimate Reanalysis Project (PaleoR) we use
“off-line” data assimilation to constrain the CESM1 (CAM5) Last
Millennial Ensemble (LME) simulation with a globally distributed
multivariate proxy dataset, producing a decadal resolution
reanalysis of the past millennium. Discrete time periods are
“reconstructed” by using anomalous (
Reanalyses combine meteorological observations and numerical model simulations to produce a realistic estimate of the state of the system, in recent decades they have revolutionised the way in which weather and climate research are conducted (e.g. Dee et al., 2011; Kalnay et al., 1996). Extending reanalyses back in time has been a major research focus (e.g. Compo et al., 2011). However, observational data scarcity prior to the 20th century is the limiting factor. Without the benefit of “real-world” data, model simulations of past climate cannot be expected to match the temporal evolution of the climate system (Bengtsson et al., 2006). Therefore, efforts to understand “real-world” climate variability over longer timescales must rely on climate signals preserved in proxy records such as ice cores and tree rings. A paleoclimate reanalysis could potentially make optimum use of proxy and model data while presenting paleoclimate information in an accessible format suitable for a wide range of research applications.
However, extracting coherent climatic signals from spatially dispersed multiproxy data is a non-trivial exercise plagued by high uncertainties and methodological challenges (Ammann and Wahl, 2007; Jones et al., 2009; Mann et al., 2005; Smerdon et al., 2013). For example, traditional approaches that look for a common signal amongst multiple proxy records are often based on the false assumption that covariance relationships remain stable through time (Gallant et al., 2013; Li and Smerdon, 2012); they also have difficulty incorporating data representing multiple climatic variables and seasonal sensitivities. Furthermore, variational assimilation schemes used in meteorological reanalysis, such as NCEP1 and ERA-Interim, are unsuitable for sparse low-resolution paleoclimate data (Widmann et al., 2010). Nevertheless, recent advances in proxy-specific data assimilation techniques have addressed many of these issues (Bhend et al., 2012; Franke et al., 2010; Goodwin et al., 2013, 2014; Goosse et al., 2006; Graham et al., 2007; Hakim et al., 2013; Schenk and Zorita, 2012; Steiger et al., 2013; Widmann et al., 2010) and made feasible the development of proxy based climate reanalysis.
Proxy data assimilation can occur at model runtime (on-line) or can be applied to existing model simulations (off-line). One of the major constraints associated with “on-line” assimilation is the high computational cost of running millennial length or longer simulations, leading to the use of simplified models (e.g. Goosse et al., 2010). It has been shown that in many situations runtime assimilation is not really necessary or advantageous (Bhend et al., 2012). “Off-line” proxy data assimilation significantly reduces computational demands and provides the opportunity to utilize existing high-resolution climate model simulations (Steiger et al., 2013).
This article describes our efforts at developing a globally relevant paleoclimate reanalysis (hereafter referred to as PaleoR) of the past 1200 years at decadal resolution. We employ the “off-line” data assimilation scheme described in Goodwin et al. (2013, 2014) where it was used to investigate atmospheric circulation patterns during the Medieval Climate Anomaly. Whereas Goodwin et al. (2013, 2014) used only Southern Hemisphere proxy data and a 10 000 year unforced simulation from the low resolution CSIRO Mk3l model, this paper employs a globally distributed multivariate proxy dataset and the recently released CESM1 (CAM5) Last Millennial Ensemble simulation (Otto-Bliesner et al., 2015). The paper is structured as follows: Sect. 2 describes the proxy dataset, model data and the assimilation scheme; Sect. 3 evaluates PaleoR using 3 complementary approaches; Sect. 4 discusses advantages, limitations and planned improvements to PaleoR; and Sect. 5 provides some concluding remarks.
The multivariate data assimilation (MDA) approach used to develop PaleoR is described below and in Goodwin et al. (2013, 2014) and Browning (2014). MDA reconstructs discrete time periods by using information from a multivariate suite of proxy data to select climate state analogues from an existing AOGCM simulation.
The multivariate proxy dataset includes individual proxy records,
published reconstructions, and regional multiproxy reconstructions
(Fig. 1 and Table A1). Proxy screening as applied in many paleoclimate
reconstructions (e.g. Cook et al., 1999) is not required for MDA as
there is no prerequisite for covariability. Proxy data of varying
temporal resolutions are accommodated, including records representing
discrete time periods; Fig. 1b shows the mean temporal resolution of
all included proxy data at each timestep. Most proxy records contain
a component of chronological uncertainty that increases back in time;
this is partially reflected by a corresponding decrease in temporal
resolution (Fig. 1b). In recognition of this uncertainty PaleoR is
currently executed at decadal resolution. For each discrete decadal
time period proxy climatic signals are calculated using 10 year means
post 1400 AD and running 20 year means prior to 1400 AD. To
facilitate the inter-comparison of proxy data representing different
variables, all decadal data are normalized relative to the 1300–2000
AD long-term mean. Most proxy records
contain a component of non-climatic “noise” resulting a relatively
high signal-to-noise ratio when compared with observational data. To
account for this, each decade is reconstructed independently, using
only proxies displaying an unambiguous climatic signal: defined as the
decadal mean exceeding
The model dataset used for PaleoR is the CESM1 (CAM5) Last Millennium
Ensemble (Otto-Bliesner et al., 2015), hereafter referred to as
LME. LME uses a
The MDA approach reconstructs discrete time periods by searching the
model data for climate state analogous to the combined signal from all
included proxy data (Goodwin et al., 2013, 2014). Each year of the LME
represents an individual multivariate realization of a physically
plausible climate state. In this respect the interannual temporal
continuity of the LME can be discarded, thereby giving an effective
ensemble size of 11 560 members. Individual ensemble members
analogous to the proxy inferred climate states for each time period
are identified by calculating the Euclidean distance between the
normalized proxy data (
In numerical weather forecasting it is common practice to take an ensemble mean and use the ensemble spread to estimate uncertainty: where the ensemble spread is large, uncertainty is increased. This is also the case in the development of re-analysis products such as the 20th Century Reanalysis (20CR), where the climate state is defined by the mean of a 56-member ensemble and uncertainty is estimated from the ensemble spread (Compo et al., 2011). A similar approach is also adopted for PaleoR, where the reconstructed climate state is defined by the ensemble mean of the 50 BMA and ensemble spread provides one estimate of uncertainty. Using 50 BMA is found to provide the optimal balance between including contributions from the maximum number of proxy records while minimizing the mean Euclidean distance. 50 BMA also provides a large enough sample size to calculate statistical significance (Browning, 2014; Goodwin et al., 2013).
Modelled climate variables can be resolved by compositing the 50-BMA ensemble; as all modelled variables are dynamically consistent, theoretically any modelled variables can be calculated. However, at present we are utilizing only variables with a mechanistic relationship to the multiproxy dataset: air temperature, precipitation, SST, SLP, and winds. Anomalies are calculated relative to the full LME and therefore represent deviations from the modelled past millennial climate, not the observed modern climate.
We evaluate the skill of the MDA using 3 complementary approaches: (1) comparison with the included proxy data; (2) pseudoproxy based “reconstruction” of a known climate; and (3) calculation of major modes of global climate variability and comparison with equivalent previously published multiproxy reconstructions.
The first evaluation tests the skill of MDA in creating a climate dataset that is consistent with the included proxy data. This can be tested by directly comparing PaleoR with the multiproxy dataset; PaleoR should also show a considerable improvement when compared to a transient simulation of the same time period without data assimilation. The experimental setup is relatively straightforward and consistent with a similar comparison performed by Goosse et al. (2006). An array of synthetic proxy records are extracted from PaleoR – at the same locations and from the equivalent climate variables as the proxy records – and compared with the original proxy records (decadally averaged). To compare against the null-case, of no data assimilation, the proxy records are also compared to the LME (mean of the 10 ensemble members) without data assimilation.
The second evaluation tests the skill of MDA in reconstructing global
patterns of variability given the spatial distribution of the proxy
dataset. This test uses pseudoproxy data to evaluate the skill of MDA
at reconstructing the known climate of a model simulation using
a synthetic proxy network (Jones et al., 2009; Mann et al.,
2007). The known climate is a single ensemble member of the LME:
LME
The third evaluation examines PaleoR as a tool for investigating multiple large-scale components of the climate system. Indices representing the major modes of global climate variability: El Niño Southern Oscillation (ENSO), Southern Annular Mode (SAM), and the North Atlantic Oscillation (NAO) are calculated directly from PaleoR spatial fields in much the same way as they are typically calculated from model or reanalysis data. Each index is compared to previously published multiproxy reconstructions and equivalent indices calculated from observational-based data. As PaleoR is at decadal resolution, robust comparisons with observational records are difficult due to the limited temporal overlap and reduced degrees of freedom associated with decadally averaged data.
ENSO is the leading coupled ocean–atmosphere mode of global climate
variability (e.g. Trenberth et al., 2005). A PaleoR ENSO index is
calculated from SSTa in the Niño 3.4 region (after Trenberth,
1997) and compared with two recent multiproxy ENSO reconstructions
(Emile-Geay et al., 2013b; McGregor et al., 2010) and an equivalent
index derived from HadISST (1870–2012). During the late 20th century
SAM has been the leading mode of atmospheric variability (Trenberth
et al., 2005). A PaleoR SAM index is calculated as the leading
Empirical Orthogonal Function (EOF) of Southern Hemisphere sea-level
pressure between 20 to 82
Figure 2a shows strong agreement between PaleoR and the proxy archive,
with correlations exceeding 99 % significance at most
locations. However, there are some records and some time periods that
do not agree. In these situations the climate signal from the proxy in
question, when evaluated against all other proxies, contains a signal
that is not consistent with any of the modelled climate states. This
can occur because of either errors in proxy dating or climatic
interpretation, or the LME dataset does not contain a complete sample
of all past millennium regional climate states. Comparison between the
unassimilated LME ensemble mean and the proxy archive shows reasonably
good correspondence, however only a few correlations exceed 99 %
significance (Fig. 2b) – similar results are observed when individual
LME ensemble members are used. The mean correlation value (absolute r)
calculated across all proxy records for PaleoR is
Figure 3 shows grid point correlations between
PaleoR
PaleoR
Direct translation of the pseudoproxy experiment results into uncertainty estimates for PaleoR is not straightforward, however they do provide a valuable framework for methodological refinement and qualitative interpretation. Because of this, quantitative error estimates are based on the 50-BMA-ensemble spread, similar to the approach used in weather forecasting and observational reanalyses. The pseudoproxy experiment results should therefore be viewed as a qualitative supplement to the ensemble spread when assessing PaleoR confidence.
The objective of this section is to demonstrate that PaleoR can be
used to investigate behaviour in the primarily global modes of
atmosphere–ocean variability and provides an alternative to
traditional paleoclimate methods that look for a common signal amongst
multiple proxy records. PaleoR derived indices of Niño 3.4 SST,
SAM and NAO are compared to equivalent indices developed using
traditional approaches. All of the comparison indices were constructed
by finding a common signal amongst multiple records that is
statistically correlated to the target index over the observational
period. As part of this process, proxy records that do not co-vary
over the length of the reconstruction are discarded. For example
Ortega et al. (2015) were forced to discard
The PaleoR indices presented here are calculated from actual modelled SST or SLP data. Climate signals at the locations from which they are derived are linked to proxy climatic signals via the dynamic equations of state used to drive the model. As long as the LME can simulate non-canonical behaviour, PaleoR indices are unaffected by changes in teleconnection patterns. In contrast, the comparison indices are a statistical representation of covariance between multiple records that, over the calibration period are correlated to the target index. This a priori requirement for covariance ensures traditional approaches cannot resolve variability that lies outside the range of modern observations. When comparing PaleoR indices with previous work it is important to keep in mind the different methodologies. Time periods when PaleoR differs from the comparison indices are likely to represent periods of non-canonical behaviour in the major modes, which can be further investigated using PaleoR spatial fields.
Figure 4a shows a moderate to strong correlation between the
Niño 3.4 SST indices derived from PaleoR and HadISST (
All three ENSO reconstructions are in general agreement, however PaloeR provides the added advantage of resolving dynamically consistent spatial patterns of coupled ocean–atmosphere variability beyond the Niño 3.4 region. Figure 5 shows an example of SST and SLP anomalies for positive and negative Niño 3.4 index periods. The spatial SST anomaly structure and extratropical atmospheric teleconnections associated with canonical ENSO are broadly consistent with observations: in the North Pacific there is a strengthening (weakening) of the Aleutian Low under El Niño (La Niña) conditions (Bjerknes, 1966, 1969; Lau, 1997); and in the South Pacific there is a weakening (strengthening) of the Amundsen Sea low under El Niño (La Niña) conditions (Mo and Higgins, 1998; Turner et al., 2013). This is expected, as the MDA approach preserves dynamical linkages between modelled climate variables.
Figure 4b shows a moderate to strong correlation between SAM indices
derived from PaleoR and the 20CR (
Both the Villalba et al. (2012) and Abram et al. (2014) SAM
reconstructions are calculated from the common signal in warm season
proxies. PaleoR SAM accommodates proxies sensitive to both summer and
winter climate variability and is also correlated to the early winter
SAM reconstruction of Goodwin et al. (2004) (
Reconstructing SAM from midlatitude proxies can be challenging, as teleconnections between SAM and some midlatitude regions can breakdown or reverse depending on the influence of the tropical pacific (Fogt and Bromwich, 2006). Traditional paleoclimate reconstruction techniques that look for a common signal amongst multiple records struggle to accommodate changing teleconnection patterns and seasonal biasa. PaleoR is unaffected by these issues, the second EOF of PaleoR SLP resembles the Pacific South America pattern (Mo and Ghil, 1987) and is highly correlated to the PaleoR Niño 3.4 index (Goodwin et al., 2013). PaleoR is therefore able to resolve temporal variations in the nature of mid to high latitude teleconnection patterns and their influence on the SAM (Goodwin et al., 2013).
Figure 4c show the PaleoR NAO is broadly consistent with the
observational NCAR NAO index (
The NAO has strong climatic linkages with many global regions (e.g. Hurrell and Deser, 2009); therefore the state of the NAO should be consistent with proxy signals from many parts of the Northern Hemisphere. The PaleoR is highly correlated to most proxy data from the North Atlantic region so it should provide a robust estimate of past NAO behaviour, irrespective of variations in teleconnection patterns. A detailed investigation into the NAO behaviour over the past millennium is beyond the scope of this paper. However, PaleoR resolves full spatial fields that can be used to better elucidate the nature of past climatic changes; as an example, Fig. 5 shows SLP patterns for both positive and negative phases of the NAO index.
PaleoR was designed first and foremost to be a paleoclimate research application; in this respect we did not design the PaleoR as a static dataset, rather as experimental tool that can be easily tailored to specific research objectives. This section discusses some of the advantages and limitations of the current PaleoR version. We also identify several areas where modifications to the current approach might be expected yield improvements in future versions.
PaleoR is developed using a relatively simple MDA approach that is computationally efficient and can be readily applied to existing model simulations. To-date we have experimented with simulations from Mk3L (Goodwin et al., 2013, 2014), CM21, CCSM4, GISS, HadCM3 and MIROC-ESM. Unfortunately most of the CMIP5 past millennium simulations contain too few ensemble members to provide a large enough ensemble for analogue selection. LME was chosen for this version due to the high resolution, realistic forcing and large available ensemble. After initial setup, a 1000 year reanalysis at decadal resolution without figure production can be calculated in less than one minute on a standard laptop computer. The minimal computation demands of “off-line” verses “on-line” data assimilation allow easy experimentation with various setups. The proxy dataset is organized as recommended by Emile-Geay and Elsham (2013) allowing easy modification of included proxy data and easy inclusion of new records as they become available. The assimilation can also be run at varying temporal resolutions in order to take full advantage of proxy chronological confidence without running the risk of over-interpretation.
As a tool for researching long term climate variability, data assimilation approaches in general represent a significant improvement on traditional principle component regression (PCR) approaches (Bhend et al., 2012) that are still considered industry standard (PAGES, 2013). One fundamental problem with PCR reconstructions applied over large spatial domains is that they require temporal stability in teleconnection relationships; this does not usually occur in the climate system (Gallant et al., 2013). The MDA approach used to develop PaleoR accommodates non-stationarity by using only the climatic signal at each proxy's location to select analogues from the LME ensemble. MDA accommodates both continuous and non-continuous proxy records and permits the simultaneous evaluation of proxies representing different climatic variables and with different seasonal bias: this has been previously a major problem in climate reconstructions (Bradley et al., 2003). As the dynamical relationships between modelled variables are preserved, any modelled variables can potentially be reconstructed. Hence, approaches like PaleoR now provide the best methods to resolve climate variability outside the instrumental era.
The accuracy and quality of all proxy-based research, regardless of
methodological considerations, is limited by the quantity, quality and
spatial distribution of the proxy data network. Two primary
ambiguities relating to the existing proxy dataset are dating
uncertainty and non-climatic noise in the proxy signal. In order to
address dating uncertainty the reconstructions are produced at
a variable resolution that is within estimated dating confidence for
the included proxy data. To address signal to noise uncertainty the
signals from each proxy are individually evaluated for each time
period and only proxies displaying an unambiguous climatic signal are
included – as defined by a normalised anomaly of
Another limitation to PaleoR is the current lack of direct tropical
SST proxies constraining the assimilation, especially prior to
PaleoR is our first attempt at producing global paleoclimate reanalysis and is expected to improve significantly with future refinements. In addition to the obvious areas for improvement such as increasing the spatial density of proxy data and experimenting with different model simulations, we have identified several key areas where improvements are planned.
There is significant scope to improve the calibration of proxy and model data. At present proxy records are compared to a single primary modelled climatic variable, either SST, SLP, air temperature or precipitation, this is typically determined by the original published interpretations. However, LME simulates a range of variables that might, in some cases, be more appropriate: such as atmospheric sea salt transport for ice core interpretation; humidity, precipitation-evaporation balance, or streamflow for hydroclimate proxies; and ocean salinity for coral proxies. Some climatic proxies are also sensitive to multiple variables, such as tree-ring growth that is sensitive to both precipitation and temperature (e.g. Fritts, 1976; Villalba, 1990); in these cases simultaneous multivariate calibrations might more appropriate. The emerging technology of proxy system modelling could potentially account for non-climatic influences on proxy signals and might also allow the direct calibration of trace chemical or isotope signals in proxies with equivalent modelled variables (Evans et al., 2013; see Hughes and Ammann, 2009).
Potential biases in the analogue selection have not been directly addressed in this study. Goosse et al. (2006) suggested that a weighting could be applied to each proxy record so that reconstructions favoured proxy records in which higher confidence was placed. The spatial distribution of proxy data is also important; reconstructions could be developed via an iterative or hierarchal approach, whereby regional reconstructions are first produced, then combined into a hemisphere reconstruction. An alternative option could be to apply weights to each proxy depending on its proximity to other records, thereby accounting for differing proxy densities in different regions.
Our focus in this work has been ensuring that PaleoR is consistent
with available proxy data (as demonstrated in Fig. 2) and testing the
methodology using pseudoproxy experiments (Fig. 3). Verification of
PaleoR against the observational record is challenging due to the
short overlap period and reduced degrees of freedom associated with
a decadally averaged dataset. For example, data-scarcity in the
Southern Hemisphere high latitudes mean comparisons with observations
are only really valid post
This article describes the methodology and evaluation of our first attempt at developing a globally relevant paleoclimate reanalysis of the past 1200 years. Our overriding ambition is to make optimum use of both proxy and model data while presenting paleoclimate information in an accessible format suitable for a wide range of research applications. PaleoR is developed using an established “off-line” multivariate data assimilation approach to constrain the LME simulation with a globally distributed proxy dataset. Assimilation of proxy and model data using MDA produces a reanalysis that is highly correlated to almost all included proxy records. PaleoR should therefore provide a reasonable representation of the “real-world” evolution of the climate system – within the limitations of the proxy archive and the model simulation.
Our data assimilation approach offers numerous advantages over conventional PCR based approaches to paleoclimatology. MDA accommodates a wide range of proxy data representing multiple climatic variables, seasonal sensitivities and temporal resolutions, thus optimising the use of available proxy records. MDA is largely unaffected by non-stationarity in covariance relationships. Full spatial fields are reconstructed for multiple variables while preserving dynamical inter–variable relationships. “Off-line” proxy-data assimilation is computationally efficient, easily adaptable to existing climate datasets and easily incorporates new proxy data, thus providing a platform for experimentation, and rapid evaluation of new proxy data and model simulations.
Paleoclimate reanalyses will undoubtedly play a pivotal role in the
future of paleoclimatology. PaleoR research and development is an
ongoing project, as such we have identified opportunities for
potential enhancements in future versions. MDA is just one of many
possible approaches to proxy data assimilation and the development of
paleoclimate reanalyses. In coming years we will no doubt see the
emergence of alternative and possibly more skilful
approaches. However, paleoclimate reanalyses will never be as accurate
as reanalyses constrained by sufficient meteorological observations –
this is simply a reality of dealing with low-resolution
high-uncertainty data. Nevertheless, PaleoR has already provided
valuable insights into climate system evolution during the past
millennium (Goodwin et al., 2013, 2014). The ability to resolve full
spatial fields across multiple dynamically consistent variables means
PaleoR is a powerful resource for investigating long-term climate
variability and the drivers of large-scale climate regime
shifts. PaleoR spatial fields for 800 to 2000 AD, as described in this
article, can be viewed online at
This research was part funded by a Macquarie University External Collaborative Grant with the New South Wales Office for Environment and Heritage, and the New South Wales Environmental Trust. S. A. Browning received a postgraduate Macquarie University Research Scholarship (MQRES). The paper draws on a significant proxy climate database, the authors thank all of the researchers and organisations that have retrieved paleoclimate proxy data and had the foresight to make it publicly available.
Proxy climate records used in PaleoR. Numbers in column 1 correspond to the locations plotted in Fig. 1.
Continued.
Continued.
Continued.
Correlations (Pearson's
Pseudoproxy experiment results: grid point correlations
(Pearson's
Timeseries plots showing three PaleoR derived reconstructions
of major modes of global climate variability:
PaleoR derived