An automated approach for annual layer counting in ice cores

A novel method for automated annual layer counting in seasonally-resolved paleocli-mate records has been developed. It relies on algorithms from the statistical framework of Hidden Markov Models (HMMs), which originally was developed for use in machine speech-recognition. The strength of the layer detection algorithm lies in the way it is 5 able to imitate the manual procedures for annual layer counting, while being based on statistical criteria for annual layer identiﬁcation. The most likely positions of multiple layer boundaries in a section of ice core data are determined simultaneously, and a probabilistic uncertainty estimate of the resulting layer count is provided, ensuring an objective treatment of ambiguous layers in the data. Furthermore, multiple data series 10 can be incorporated and used simultaneously. In this study, the automated layer counting algorithm has been applied to an ice core record from Greenland. The algorithm shows high skill in reproducing the results from manual layer counts.


Introduction
An accurate chronology is of fundamental importance for the interpretation of a paleoclimatic record.For some ice core records, the temporal resolution of the records is sufficiently high to preserve seasonal information, and with the increased resolution presently being achieved in many ice core measurements (e.g.Bigler et al., 2011), an annual signal is preserved in an increasing number of different data records and spanning much longer time intervals.Annual layer counting represents the most accurate method to produce a chronology for ice cores.Yet, manual layer counting is a tedious and sometimes subjective method.The Greenland Ice Core Chronology 2005 (GICC05) was developed over several years and involved the efforts of many researchers: counting, comparing, and re-counting the annual layers recorded in multiple data records from several Greenland ice cores (Andersen et al., 2006b;Rasmussen et al., 2006;Svensson et al., 2006).The chronology extends back to 60 kyr b2k (kyr Figures before 2000 AD) (Svensson et al., 2008).Decreasing annual layer thicknesses cause the annual signal in most ice core parameters to weaken and eventually disappear with depth.As reliable layer recognition becomes more challenging, a manual approach increasingly relies on subjective interpretation of the available data.Thus, time-wise and potentially also quality-wise, much may be gained by an automated and therefore reproducible approach for annual layer identification in ice core records.
Many attempts have been made to develop automated methods of annual layer counting in paleoclimatic archives (McGwire et al., 2008;Rasmussen et al., 2002;Smith et al., 2009).These have generally had limited success where the annual signal in the data is inherently ambiguous, leaving manual layer counting to still be considered the most accurate.In comparison to these previous attempts, the method developed here is quite similar to the manual approach of layer counting; multiple layer boundaries in an entire data section are determined simultaneously, while allowing for a very flexible definition of an annual layer signature in the record.Furthermore, the method allows for multi-parameter counting, using the annual information in multiple data records at once.It has hitherto proven difficult to incorporate such properties in automated layerrecognition methods.
The annual layer detection algorithm has here been applied to visual stratigraphy data from the NGRIP ice core, Greenland, as obtained from line-scan images (Svensson et al., 2005).The NGRIP ice core has particularly high temporal resolution with depth, and therefore has the potential to be dated by annual layer counting far back in time.The oldest part of the GICC05 chronology was based exclusively on data from this core.For the depth interval in consideration, the visual stratigraphy displays a clear banding of dark and bright layers, which essentially provides a proxy for dust content in the ice core.The data is at such high resolution that it may resolve individual precipitation events.As the influx of dust to the Greenland ice sheet displays a seasonal pattern, so does also the grey-tone intensity values of visual stratigraphy (Alley et al., 1997;Hamilton and Langway, 1968).More often than not, however, the annual signal in the data is ambiguous, and when considered in isolation not easy to use for layer Introduction

Conclusions References
Tables Figures

Back Close
Full counting.Nonetheless, due to the ability of the line-scan data to resolve annual layering in sections of thin annual layers, it has potential for being utilized to extend the GICC05 chronology further back in time.

A Hidden Markov Modelling approach to annual layer recognition
The layer detection routine is based on the statistical framework of Hidden Markov Models.A Hidden Markov Model (HMM) is a stochastic signal model that can be used for modelling the output of a system displaying Markovian behaviour, i.e. a stochastic system which transitions between states, and where the next state of the system depends only on the current state.By comparison to observed data, knowledge of the nature of the underlying signal can then be obtained (Rabiner, 1989).The concept of Hidden Markov Models was originally introduced in the late 1960s (Baum and Petrie, 1966), and has successfully been applied to pattern recognition in the field of machine speech-recognition since the mid-70s (Jelinek et al., 1975).The rich mathematical structure of Hidden Markov Models can form the theoretical basis for a wide range of signal modelling applications, spanning from magnetic resonance imaging (MRI) brain mapping (Faisan et al., 2005), via electrocardiography (ECG) (Koski, 1996;Thoraval et al., 1994), to the analysis of protein structures (Schmidler et al., 2000), as well as financial time series (Bulla and Bulla, 2006).It will here be applied to annual layer recognition in ice core records.An annual layer recognition algorithm based on Hidden Markov Modelling is inherently of Bayesian nature.The annual layers are obtained based on prior knowledge on e.g. the general layer thickness probability distribution, which is updated based on observed data, and the resulting layer boundaries are given as probability distributions, both in depth and layer number.When calculating these probabilities, an entire observation sequence is taken into account at once.The likelihood that a given observation segment represents an annual layer is therefore judged not only by its own resemblance to a typical annual layer, but is judged in conjunction with the likelihood of the Introduction

Conclusions References
Tables Figures

Back Close
Full resulting layering on either side.Determination of layer boundaries in data sections with poorly-resolved annual layering is therefore primarily based on the positioning of surrounding clearly discernible annual layer boundaries.Sections containing missing data are treated much the same way: an appropriate number of layers is fitted into these regions, with the "appropriate number" based on knowledge of the layer thickness probability distribution in combination with the most likely positions of layer boundaries in the surrounding data.Simultaneously, the spread of the probability distribution of counted annual layers is generally enlarged, corresponding to increased uncertainty in the total number of counted layers.This is very similar to what is implicitly done when counting layers manually.
The method can relatively easily be extended to allow the incorporation of multiple data series containing an annual layer signal and infer the most likely annual layering based on all of these data series simultaneously.It therefore provides the necessary statistical framework to allow for an automated multiple-parameter counting algorithm.This property of the algorithm is important because misinterpretation of the annual layering in any single ice core record is likely to occur due to variability of the annual layer signal and occurrence of non-annual features.For the current purpose, information has been incorporated based on the derivative of the line-scan data series as well as the observed line-scan data itself.

Hidden Markov Models
In a Hidden Markov Model (HMM), the system under consideration is allowed to be in a finite number of states.The state of the system corresponding to any index (e.g.time) t is considered a stochastic variable S t with the allowed state outcomes j , j ∈ {1, 2, . . ., J}.For the purpose of annual layer detection, j will represent a label corresponding to the annual layer number.The use of t for indexing (t spanning from Introduction

Conclusions References
Tables Figures

Back Close
Full sequence, S 1:T , is a first-order Markov chain, i.e. the next state of the system is only dependent on the current state.
If a direct outcome of the state sequence can be observed, it is easy to characterize the statistical nature of this signal.Very often, however, this is not possible.Instead, it may be possible to observe the influence of the state sequence on another stochastic process, the outcome of which is seen as a sequence of observations, o 1:T .The model hereby has a two-layer structure, with the state sequence providing the unknown "truth", and each observation by itself only providing incomplete information on the current state.Algorithms developed for Hidden Markov Models allow for the statistical inference of the underlying hidden Markovian state sequence (Rabiner, 1989).
Similar algorithms have also been developed in case of so-called hidden semi-Markov state sequences, in which the changes in state are endowed with a Markov property, but with holding times of each state distributed according to a prescribed probability distribution p(d ), with the duration d given in terms of number of observations covered by each state.Such a process can be envisioned as a doubly embedded Markov chain, with generalized states consisting of both the state label and state duration.Allowing the underlying stochastic process to be semi-Markov, this variant is sometimes known in the literature as a "Hidden Semi-Markov Model" (HSMM) (Yu, 2010), or -depending on the specific assumptions of the model, its application area and the author -a "segment model" (Ostendorf et al., 1996).
The sequence of encountered years in a seasonally resolved data record, when assuming the associated annual layers to occur in sequential order without any skipping, can be viewed as a degenerated Markov chain.Hence, the observations corresponding to a succession of annual layers in an ice core record can be regarded as the output of a hidden semi-Markov process, in that each individual data point can be labelled with a corresponding state (i.e. the layer number) as well as a duration (i.e. the layer thickness), see Fig. 1.The desired depth-age relation is then the most likely state sequence given the observations.Introduction

Conclusions References
Tables Figures

Back Close
Full The application of HMMs for annual layer counting in annually laminated paleoclimate archives differs from their application in most other areas.The changes in state are simple, with one year simply following the previous.However, this simplicity is often combined with large inter-annual variability in the annual layer expressions, thereby giving rise to a rather challenging pattern-recognition problem.The general HMM algorithms must therefore be adapted to these conditions.

Characterizing an annual layer
Mathematically, the layer detection algorithm works by reducing the complex issue of simultaneous pattern-matching of multiple successive layers to a given template, to the much simpler question concerning how likely a particular data segment is to represent a single annual layer.The likelihood of a given observation segment forming an annual layer is partly determined by the probability of the resulting annual layer thickness, and partly by the resemblance of the segment to an annual layer signal.These probabilities provide the criteria used by the layer detection algorithm for subsequently retrieving the most likely annual layer sequence based on the entire data record.
Empirical data shows that for a given depth interval, the annual layer thicknesses in an ice core are approximately lognormal distributed (Andersen et al., 2006a).The assumed probability distribution of the layer durations is therefore taken to be a lognormal distribution described by the two parameters µ d and σ d : The parameters will be termed the location (µ d ) and scale parameter (σ d ) of the distribution.The mean annual layer thickness, λ, is a function of both of these parameters.The continuous probability density function is discretized and normalized to provide duration probabilities corresponding to a finite integer number of data points.
Resemblance of a given observation segment o t 1 :t 2 to a typical annual layer signal can be assessed in a variety of ways.The likelihood that this segment forms an annual layer will here be determined by regression of the segment to a layer template.Introduction

Conclusions References
Tables Figures

Back Close
Full This so-called emission probability is formally defined as the conditional probability of observing a given sequence of observations when assuming these to form an annual layer.Following Yu (2010), we will use the short hand notation S [t 1 :t 2 ] = j to signify that layer j (i.e.state outcome j ) starts exactly at t 1 and ends exactly at t 2 (both data points included).Using this notation, the emission probability can be written as follows: The inferred layer boundaries depend critically on the applied annual layer template and the model parameters herein, θ, used for judging how well the segment of observations o t 1 :t 2 exhibits the characteristics of an annual layer.
Annual layers in the NGRIP line-scan data display significant variability in shape from one year to the next, and the annual layer template must be able to capture this diversity.To ensure such flexibility of the layer detection algorithm, the annual layers are modelled based on a generic layer template, which consists of an appropriate selection of analytical basis functions (x 1 , x 2 , x 3 ,. . . ) that each describe some aspect of the shape of the layer signal.The basis functions are normalized with respect to the annual layer thickness such that their resulting shape is independent on this.Layer trajectories are then formed by linear combinations of these basis functions, and assumed to be observed with an additive Gaussian white noise component.By forming the matrix X = [x 1 , x 2 , x 3 , . ..], the signal for layer j can be described as the noisy output of a linear system: For a specific annual layer, the weightings corresponding to the individual basis functions are contained in the vector a j .These weightings are assumed to be multivariate Gaussian By using the above formulation, the description of an annual layer is divided up into average layer characteristics (ϕ), and random effects pertinent to each individual layer, r j ∼ N (0, Φ), that cause it to differ from the average.An annual layer can therefore also be described as: (2) This model for the annual layers represents a two-level hierarchical model, and the emission probabilities can be evaluated by Bayesian linear regression (Bishop, 2006).
See Winstrup (2011) for a more detailed description of how these computations are carried out.The employed model is an extended version of the linear trajectory models used by Gish and Ng (1996) and Russell and Holmes (1997), and bears many similarities to the one used for electrocardiographic waveform detection by Kim et al. (2004) and Kim and Smyth (2006).
The explicit modelling of the allowed range of inter-annual variability in layer shape permits the developed layer detection algorithm to address this general problem faced by any annual layer counting routine.The range of year-to-year variability in layer shapes becomes an integrated part of the layer detection algorithm, and enables it to better handle observation sequences in which the annual layer signal is noisy and irregular.
A principal components analysis (PCA) was initially conducted on the line-scan data to constrain the allowed variability of individual layer shapes during warm and cold climate regimes, respectively.This analysis was based on results from manual layer picking.The first two principal components describing the layering in the cold period (Fig. 2c) and the warm period (not shown) explain respectively 73 % and 57 % of the variance in layer shapes from the mean.With three principal components the numbers increase to 84 % (cold period) and 68 % (warm period).The difference in the amount of variance explained indicates a less distinct annual signal during the warm period, which can also be seen by visual comparison of the two data series.The two sets of principal components are very similar, but their mean trajectories differ slightly.In the Introduction

Conclusions References
Tables Figures

Back Close
Full following, we have employed analytical functions that mimic the variability described by the principal components as basis functions, and regression to these is performed on the deviation of an observation segment from the calculated mean annual layer signal.
Observe that o 1:T is allowed to be a sequence of vector observations, and hence each data point may e.g.contain a collection of measured chemistry data.As the assessment of the probability b(o t 1 :t 2 ) can be made based on multiple data sets, it allows for the annual layer signal in all of these to be taken into account simultaneously.
We have here considered both the line-scan data as well as its derivative as two separate, yet related, observation sequences.Although the two data sequences essentially contain the same information, the noise component on the derivative data series is generally "whiter", but has the caveat that the signal-to-noise ratio is smaller.Consequently, by combining information from both the grey-tone intensity data series and its derivative, the underlying assumption is relaxed that all variance of an annual signal not explained by the layer template can be considered white noise.Indeed, when applied to visual stratigraphy data where the noise corresponding to successive data points is correlated, much better estimates of the annual layering were obtained when utilizing also the derivative of the observed sequence of grey-tone intensities.

Inference of a most likely layer sequence and its uncertainty
Having calculated the emission probabilities (Eq. 1) for all possible layer start and ending positions, the most likely layer at any given location can be inferred.Consider the option of using brute force to examine all possible state sequences, one at a time, calculating the respective probabilities, and adding up the contributions from those that give rise to the same state j at each t.In this way, the most likely layer at t can be found, and the result will be based on the entire observation sequence in consideration.In reality such an approach is not feasible, but the same probabilities can be efficiently calculated by recursion by the Forward-Backward algorithm, which routinely is applied to HMMs, and here is used in the extended version applicable for HSMMs (Yu, 2010).Introduction

Conclusions References
Tables Figures

Back Close
Full The name "Forward-Backward algorithm" is derived from the way the algorithm makes the judgment of a most likely state sequence in a rigorous yet efficient manner by executing a double pass of the observation sequence: a forward pass, which passes on the information included in all previous data, and a backward pass, containing information from all subsequent data.The best estimate of the hidden state sequence is then found by combining the two sources of information.In this way, the entire data sequence is used for inferring the most likely layering, and also the involved uncertainties can be evaluated.These provide an estimate of the counting error very similar to that of the manually layer counted GICC05 chronology, in which uncertain layers were counted as 1/2 ± 1/2 yr (Andersen et al., 2006b;Rasmussen et al., 2006).
In the forward message pass, the forward variable α t (j , d ) is calculated.It is defined as the joint probability of ending state j with duration d at t and observing the partial observation sequence o 1:t .It can be recursively calculated by: (3) Similarly for the backward message pass, a backward variable β t is defined as the probability of observing the partial observation sequence o t+1:T when conditioned on state j ending at t.Using the shorthand notation S t] = j to indicate that layer j ends at t, without providing any information on where the layer begins, the backward variable can be calculated as: As initialization condition for the two recursive passes of the data series, the first observation is assumed known to be in state 1 , whereas the state corresponding to the last observation is considered unknown, and thus all possible states have equal probability.
Due to the flexibility of the Forward-Backward algorithm to allow for inter-annual variations in layer thickness, the algorithm is not very strongly dependent on the specifics of Introduction

Conclusions References
Tables Figures

Back Close
Full These equations can be derived from the general equations of the Forward-Backward algorithm for HSMMs as given e.g. in Yu (2010), when using the following simplifying assumptions: encountered layers are assumed to occur in sequential order without any skipping, the thickness of a layer is assumed independent on thickness and label of the previous layer, and all layers are assumed to be an outcome of the same process, implying that the layer signal (shape as well as thickness) is independent on the layer number.Previous studies (Andersen et al., 2006a;Fisher et al., 1985) point to the existence of a slightly negative correlation between the thickness of successive layers, and such correlation is not taken into account in the above simplified formulation.Also observe that the algorithm does not impose any restrictions regarding continuity of the fitted layer trajectories across layer boundaries: for each layer, the best layer trajectory is determined independently from those of its neighbours.In practise, however, none of these shortcomings represent a major limitation of the model.A formal derivation of the appropriate formulation of the forward and backward variables, and a more explicit description of the initialization conditions used, can be found in Winstrup (2011).
By multiplying the forward and backward variables (Eqs.3 and 4), one obtains the posterior probability of ending layer j with duration d at t, based on the entire observation sequence (Yu, 2010): (5) Summing over all values of d gives the probability of ending layer j at t regardless of its duration: The probability of being in layer j at t can then be calculated recursively as the probability of being in layer j at t − 1, subtracting the probability of ending layer j at t − 1, and adding the probability of starting the layer at t:

Conclusions References
Tables Figures

Back Close
Full This recursive formulation quantifies the probability of being in any layer j at any point t in the observation sequence.In the Forward-Backward algorithm, the most likely state sequence is understood as the state sequence in which each state individually has maximum posterior probability when conditioned on the entire observation sequence.
The maximum a posteriori (MAP) estimate of the state corresponding to the observation at t is therefore: This is the most likely layer at t. Furthermore, as the full probability distributions are computed, the uncertainty associated with inference of the most likely layer at a given position can be estimated by considering the width of the derived distributions.
A chronology consisting of both a best estimate ( MAP (t), the mode of the distribution) and an uncertainty estimate (here taken as distribution percentiles) can therefore be derived.
However, it must be stressed that the validity of these uncertainty estimates is contingent on the annual layer template and corresponding model parameters to be valid.
The derived uncertainty estimate does not take into account possible inaccuracies in these, and should therefore always be regarded as a lower bound estimate.
Figure 3 shows the output of the layer detection algorithm when applied to a small section of line-scan data from NGRIP.The computed values of ηt (j ) (Fig. 3a) estimate the probability of ending layer j at t, and thus peaks are locations of highly probable layer boundaries.A general decrease in peak height with distance from the beginning of the data series is caused by an increasing uncertainty as to which layer the boundary belongs.The total probability of ending any layer at a given location t can be calculated by summing the individual contributions from all layers.By subsequent conversion to γ t (j ), the probability of being in each layer at a given position can be assessed (Fig. 3b).
The further away from the beginning of the data series, the less certainly can the layer number be determined.As a result, the maximum probabilities corresponding to a given layer j slowly decrease as the spread of the distribution of possible annual layers at Introduction

Conclusions References
Tables Figures

Back Close
Full a specific location (Fig. 3d) increases.The most likely annual layering (Fig. 3c, red lines) can be determined from the mode of these distributions.
For the line-scan data in Fig. 3, several interesting features of the output of the layer detection algorithm can be noted.First observe how well the inferred most likely layer boundaries correspond to those in the manually obtained GICC05 chronology (Fig. 3c), despite this chronology being based on the annual layer signal in multiple data series.Only one extra layer boundary (depth: 2228.04 m) is reconstructed by the algorithm.Considering the data in this interval, it is seen that indeed the annual layer signal in the line-scan data here is ambiguous: the line-scan data displays a distinct peak, but the distance to the surrounding layers is relatively small.When considering in detail the inferred values for ηt (j ), peaks in this probability variable is seen to correspond to peaks in the line-scan data which may or may not indicate an annual layer.In general, only some of these peaks give rise to an inferred layer boundary, while others act to increase the uncertainty estimate of the counted number of layers.Uncertain positioning of a specific layer boundary may also occur, in which case a broad peak is observed in ηt (j ) (see e.g.around 2228.12 m).Also observe how an ambiguous layer boundary at one location (e.g. at 2228.04 m) results in ηt (j ) having multiple peaks down core for a given layer j .
The annual layering in the first part of the data series is relatively distinct, and the algorithm recovers the manually deducted annual layering with high confidence, which can be seen from the inferred very spiky γ t (j ) probability distributions (Fig. 3d, I).The equivocal interval around 2228.04 m is picked out by the algorithm as containing a possible layer boundary, which is judged to be more likely than not: comparing Fig. 3d, I  and 3d, II, it is seen that after 2228.04 m, there is about 1/3 chance of still being in layer 3, and 2/3 chance of now being in layer 4. The probability distribution is subsequently slowly broadened in a non-trivial way as the algorithm encounters other uncertain layer boundaries (Fig. 3d, III-V).Introduction

Conclusions References
Tables Figures

Back Close
Full

Layer detection in successive batches of data
It is not feasible, nor desirable, to run the Forward-Backward algorithm on several hundred meters of ice core data at once.While also increasing the computational complexity, doing so would require a homogeneous data series, in which the layer thickness distribution as well as the annual layer signal is essentially stationary.Neither of these conditions are satisfied: the mean annual layer thickness changes down the ice core as the combined result of climate-induced variations in past accumulation rates and a general thinning of layers with depth due to ice flow.Furthermore, in different climate regimes the seasonal influx of dust and other impurities to the ice sheet may differ, hereby altering the general annual layer signal in the ice core data (Andersen et al., 2006b).A better strategy is to divide the total data series into smaller batches, apply the layer detection algorithm to one of these at a time, and subsequently stitch them together by convolving the resulting layer probability distributions.Each of these batches must be sufficiently long to fully exploit the HMM's optimal estimation of layer boundaries, while being short enough that the assumption of a fixed layer thickness distribution and layer signal is reasonable.Here, we have run the layer detection algorithm on batches of data covering approximately 50 yr each.During fast climatic shifts, however, the mean annual layer thickness may display significant changes even more abruptly than this (Alley et al., 1993;Steffensen et al., 2008).
By dividing the full observation sequence into batches and running the layer detection algorithm on each of these individually, some of the information contained in the full data series is lost.Close to the edges of each batch, the lack of surrounding data will in general cause the annual layer boundaries here to be placed less accurately.To large extent, however, such knowledge can be recovered by choosing data batches in consecutive order and in an overlapping fashion.Furthermore, some of the information inferred from the layering in one batch of data may be incorporated into the next.This will be discussed in Sect.2.5.Introduction

Conclusions References
Tables Figures

Back Close
Full

Optimization of applied model parameters
The outcome of the Forward-Backward algorithm is an evaluation of the layering in a data series when given a set of model parameters (θ) to describe the annual layer signal.In addition, however, the Forward-Backward algorithm can be utilized for evaluating the likelihood of the employed set of model parameters based on the appearance of inferred layers in the observation sequence (Rabiner, 1989;Yu, 2010).This is possible because the computed posterior probabilities, Eqs. ( 5) and ( 6), are jointed with the probability of the entire observation sequence and -though not always explicitly stated in the preceding equations -conditioned on the model parameters.By normalizing the posterior probabilities, the likelihood of the applied model parameters can be determined (see e.g.Winstrup, 2011).This feature of the annual layer detection algorithm provides the opportunity to implement a learning process, which can select the model parameters best suited for modelling the observations.In other words, the model is able to improve on an initial guess of the appropriate model parameters based on how the data actually appear.By doing so, the model is able to continuously adjust itself to temporal changes in annual layer appearance as well as layer thickness probability distribution, and it gives rise to high performance of the algorithm even when only imperfect knowledge on the appropriate parameter values is available.The abruptness of some climatic events (Steffensen et al., 2008), influencing both the mean annual layer thickness and the annual layer signal recorded in the ice core records, makes this a very important issue.The opportunity of such training of the annual layer detection algorithm is a major advantage of the Hidden Markov Modelling approach.
The optimization has here been performed using the Expectation-Maximization (EM) algorithm (Baum et al., 1970;Dempster et al., 1977;Gupta and Chen, 2011).The EMalgorithm presents an efficient iterative method for obtaining a point estimate of the (possibly local) maximum likelihood set of parameters.See Appendix A for update equations of the respective model parameters.Introduction

Conclusions References
Tables Figures

Back Close
Full Due to this optimization procedure, it is in principle possible to determine the annual layering in an observation sequence without any prior information on the appearance of layers in the data.However, given the relatively high noise level in the line-scan data, it was found that these data by themselves did not contain sufficient information to produce robust statistics and adequately constrain the layer detection algorithm.Thus, in the present work some model parameters (σ d , Φ, and σ ε ) were estimated based on manual layer counts within a smaller data section and subsequently held fixed.Other parameters (µ d and ϕ) were allowed to change with depth and were determined separately for each batch of data.
Another option for constraining the algorithm is to incorporate prior knowledge on the appropriate parameter values.In this case, a modified version of the EM-algorithm can be used to retrieve the set of parameter values of (possibly local) maximum posterior probability (see e.g.Gupta and Chen, 2011).Including prior knowledge on parameter values will generally act to stabilize the performance of the layer detection algorithm by lessening its demand on data quality and volume.Prior information may consist of knowledge derived from layering in previous data batches and/or be driven by information based on other parameters such as the δ 18 O-record, which can provide the algorithm with information on the current climate regime.The high correlation between climate and accumulation rate (Dahl-Jensen et al., 1993) implies that δ 18 O data may e.g.be used for quantifying prior knowledge on changes in mean annual layer thickness with depth.By incorporating priors, the layer counting algorithm will be allowed to continually adjust itself to temporal changes in annual layer signature in a more controlled, yet very flexible, manner.
The potential advantages of incorporating prior knowledge will be explored in future studies, but has here been avoided in order not to guide the layer detection algorithm unnecessarily.The maximum likelihood solution, as employed here, represents the most transparent implementation of the algorithm, which is an important property when assessing its performance.

Conclusions References
Tables Figures

Back Close
Full

Sensitivity tests
The layer detection algorithm was tested on synthetic data containing a recurring "annual" signal simulating that in the observed data.These data series were constructed as a succession of sinusoidal layer shapes displaying varying amounts of inter-annual variability in amplitude and added Gaussian white noise.From ensembles of synthetic data series, statistics on the performance of the algorithm were assessed.The algorithm was observed to work very robustly, and no bias towards either too thick or too thin annual layers was noted.It should be no surprise that the annual layers are easiest to identify when individual layer thicknesses are fairly similar, the layer shapes relatively identical, and the signal-to-noise ratio is high.However, even with individual layers displaying a wide range of amplitudes and containing much noise, the outcome of the layer detection algorithm was observed to be very accurate.We considered e.g. an ensemble of synthetic data series, each consisting of approximately 50 layers, in which the annual layer signal displayed a moderate amount of noise and interannual variability (in Eq. 2: ϕ = 1, Φ = 0.5, σ 2 ε = 0.5).Based on the inferred distribution of error in the resulting most likely layer estimates, the maximum counting error (taken as 2σ of the distribution) was observed to be around 1.5 yr, i.e. 3 %.When increasing the inter-annual layer variability further, the maximum counting error grew, but in none of the considered scenarios did it exceed 5 %.This occurred when selecting ϕ = 1, Φ = 1, and σ 2 ε = 0.5.The synthetic data series hereby produced were so noisy that it would have been deemed problematic to reliably identify the annual layers therein by eye.The automated counting algorithm, on the contrary, was able to count these in an unbiased manner due to knowledge of the annual layer thickness probability distribution in combination with a robust statistical measure for judging what is most likely to be an annual layer.
Inherent to the Forward-Backward algorithm is the simultaneous derivation of an uncertainty estimate of the counting accuracy.In the previous section, only the inferred most likely number of layers was evaluated, but also the accuracy of the inferred

Conclusions References
Tables Figures

Back Close
Full uncertainty estimate must be assessed.This was done by counting the percentage of simulations within a given ensemble for which the true number of annual layers in the data series was respectively within the 25 % and 75 % percentiles and within the 2.5 % and 97.5 % percentiles of the inferred annual layer probability distribution.As hoped for, this was observed to be the case for respectively ∼50 % and ∼95 % of the ensemble members, regardless of the specifics of the ensemble in consideration.Indeed, mean values for all considered ensembles averaged 68 % and 98 %, hence indicating slightly conservative uncertainty estimates.For this simple case of synthetic data with known annual layer template and model parameters, the estimated uncertainty bounds were thus obtained very reliably.
We also tested the capability of the layer detection algorithm to iteratively find a best set of model parameters based on the appearance of layers in the data.For an ensemble of synthetic data series with a moderate amount of inter-annual variability and white noise (ϕ = 1, Φ = 0.5, σ 2 ε = 0.5) and a sensible choice of initial parameter values, the number of iterations required before convergence was reached was generally 5 or less.
However, the ability of the algorithm to swiftly converge towards an appropriate set of parameter values is highly dependent on the overall difficulty of recognizing the annual layers in the data as well as on the appropriateness of the initial parameter input to the algorithm.In unfavourable conditions, correct estimates of the parameters may indeed be impossible to infer.Yet, in all investigated cases, the algorithm quickly converged to the set of parameters used for constructing the data, with a probability distribution as expected when using relatively small samples containing just 50 layers.

Results
The layer detection algorithm was implemented for visual stratigraphy data from NGRIP, in which the annual layer signal may sometimes be ambiguous, but is maintained to great depths.The algorithm has been tested over three sections representative for the deeper part of the NGRIP ice core: a warm period (Greenland Interstadial 12, GI-12,

Conclusions References
Tables Figures

Back Close
Full depth 2200-2220 m, duration approx.1200 yr), the preceding cold period (Greenland Stadial 13, GS-13, depth 2225-2240 m, duration approx.840 yr), and the transition between the two.The associated time interval is approximately 45.9-48.3kyr b2k.The inferred layering is compared to the manually counted layers in the GICC05 chronology.For this depth interval, the GICC05 chronology is mainly based on the high-resolution records of electrolytic melt water conductivity, ECM (an acidity index) and the visual stratigraphy.During the interstadial, increased layer thicknesses also permitted records of water-soluble ion concentrations, particularly Na + , to be employed (Svensson et al., 2008).The manual counting uncertainty was estimated to be around 5 %.Before application of the layer detection algorithm to the line-scan data, the data was treated with the aim of increasing the similarity of individual layers.This was done by first log-transforming the data (to stabilize the peak heights of individual years) and subsequently normalizing data according to its minimum and maximum values (to remove the differences in peak heights between different climate regimes).The window length used for normalization was 10 cm and hence contained several annual layers.
An example of the transformed grey-tone intensity data is shown in Fig. 2a.As previously discussed, the transformed intensity data and its derivative were employed as two separate data series.The derivatives were calculated based on a smoothed version of the intensity data as obtained by applying a first-order Savitzky-Golay filter (Savitzky and Golay, 1964).
Climate conditions were relatively stable during both the stadial and interstadial period under investigation (Fig. 4a).Most of the parameters describing how an annual layer is expressed in the line-scan data are therefore expected to stay more or less unchanged throughout each of the two periods.For each depth interval, several of the model parameters (σ d , Φ, σ ε ) were estimated based on manual layer counts in the upper 25 % of the data section and maintained as fixed values throughout the interval.The location parameter for the annual layer thickness distribution, µ d , and the vector parameter ϕ describing the mean layer trajectory, were allowed to change with depth and were optimally chosen according to appearance of the data.The algorithm was allowed to perform five iterations for each batch of data, at which point it was assumed that convergence had been reached.The resulting set of model parameter values was subsequently used as input to the next batch.
During the cold GS-13, the algorithm performs very well.The most likely duration of the selected period is by the algorithm inferred to be 1249 yr, which is to be compared to an estimated duration of 1204 yr in the GICC05 chronology (Table 1).It corresponds to a relative discrepancy of just 3.7 %, and the obtained layer count is well within the uncertainty of the manual layer count.The detailed similarity between the two results can also be seen from the evolution of layer thicknesses with depth (Fig. 5c).These display the same decreasing trend as the manually obtained annual layer thicknesses, and the two also share most of the short-period variability.
From the mean layer thicknesses in Fig. 5c, it is seen that only in two small sections are the inferred layer counts outside the maximum counting error on the manual layer estimate.In particular a spike without manual counterpart occurring around a depth of 2233 m is evident.The existence of such spikes is at least partly due to the algorithm being negligent of prior information from previous batches and determining the most likely model parameters (describing e.g. the mean annual layer thickness) separately for each batch.In shorter sections with particularily ambiguous layering, the algorithm may therefore be led astray.This can also explain the slightly higher degree of variability in mean layer thickness between successive intervals observed in the automated layer count.
If comparing in details the layer boundaries obtained from the automated counting algorithm to those of GICC05, it becomes clear that to a large extent disparities are caused by genuine ambiguities in the annual layering of the line-scan data.Furthermore, most of the ambiguous layers do influence the inferred number of counted annual layers by broadening the annual layer probability distribution.Disagreements in inferred total number of counted layers are mainly caused by differences in how many of these ambiguous layers are being counted as years.Introduction

Conclusions References
Tables Figures

Back Close
Full The inferred uncertainty intervals on the annual layering cannot be directly compared to the Maximum Counting Error (MCE) estimate of the GICC05 chronology.The MCE is a conservative estimate of the involved uncertainties (Andersen et al., 2006b).Although it can be regarded as a 2σ-error bound, it takes into account that the manual counting procedure may be biased, and the uncertainty estimates of individual layer boundaries are simply summed up.This leads to an almost linear increase in uncertainty with depth.The layer detection algorithm developed here, on the other hand, assumes the counting to be unbiased.Within each section, the individual uncertainties are allowed to partly compensate for each other, thereby causing the resulting uncertainty estimate to grow much more slowly with depth.The variance of the annual layer distribution will grow approximately linearly with depth, and consequently the standard deviation will grow with the square root.However, the very narrow uncertainty interval (±1.3 %) resulting from the assumption of unbiased counting may be optimistic when dealing with real data.
Given that the annual layer signal in the line-scan data is most distinct during cold periods, the annual layer detection algorithm is likely to work best here.When applied to data from the interstadial GI-12, the performance of the algorithm is indeed slightly weaker.Nevertheless, the percentage-wise deviation from the number of manually counted layers within the considered data section is still just 6.6 %, and the two associated 95 % confidence intervals are overlapping (Table 1).The similarity between the manual and automated layer counts can again be observed from the general evolution in annual layer thicknesses with depth (Fig. 5a).To a large extent, the inferred layer thicknesses match those of the GICC05 chronology, although not as closely as for the cold period.Generally, the algorithm seems to be slightly too optimistic in its layer detection for this section, with most of the layers deemed uncertain in the manually counted chronology here being counted as actual layers.In future work, this issue will be addressed by careful investigation of the nature of non-annual features in the data series, and it can be dealt with by simply allowing a higher degree of white noise in the Introduction

Conclusions References
Tables Figures

Back Close
Full Finally, the layer detection algorithm was run over the data section containing the onset of interstadial GI-12.As for the previous data sections, the algorithm was run downwards the core, starting in ice deposited during the warm interstadial (with large annual layer thicknesses), and going towards the older and deeper ice deposited during the stadial (small layer thicknesses).Based on manual layer counts, the mean layer thicknesses across the transition is known to change with more than a factor two (Fig. 4b), and also the appearance of the line-scan data changes dramatically over the transition.
To some extent, the changes in appearance of the line-scan data over the onset of the interstadial was dealt with by the previously described normalization procedure applied to the grey-tone intensity profile.However, the annual signal in the visual stratigraphy changes more with climate, and hence with depth, than can be rectified by merely adjusting the peak height of the signal.We have here allowed the mean annual layer signal to change down the ice core, but all other changes in annual layer signal have not been taken into account.The generally enhanced amount of noise in the annual layer signal during warmer periods is e.g.neglected.Estimates for the non-varying model parameters were again determined based on line-scan data in the upper 25 % of the section.
Despite the abrupt change in mean layer thickness and annual layer signal, the layer detection algorithm is able to adapt to the changing environment and find an appropriate value of the mean annual layer thickness throughout the transitional zone (Fig. 5b).The resulting number of counted layers within the transitional period differs just 4.6 % from the manual layer counts, which is within the uncertainty of the manual counting (Table 1).
We also tested the performance of the layer detection algorithm when describing the annual layer signal with three instead of two principal components.In this case, the discrepancy in annual layer count during GS-12 was reduced to just 0.3 %, and resulted in Introduction Full near perfect layer identification throughout the interval.However, using three principal components led to significantly larger discrepancies in layer counts for the interstadial and the onset of the interstadial period (∼8 %), where the increased flexibility in annual layer template allowed a larger number of ambiguous layers to be identified as annual layers.

Conclusions
For paleoclimate archives with sufficient resolution to resolve sub-annual variability, annual layer counting provides a means of obtaining a very accurate chronology.Establishing such chronologies has so far predominantly been accomplished manually.
The algorithm developed here represents a first step towards a high-quality automated method of annual layer counting.Based on the statistical framework of Hidden Markov Modelling, originally developed for machine speech-recognition, it presents a mathematically rigorous yet efficient method to determine the most likely layering in a data series and its associated uncertainty.The algorithm imitates the manual procedures, while being based on objective yet flexible criteria for annual layer recognition.
The layer detection algorithm has been applied to visual stratigraphy data from the NGRIP ice core, Greenland, in which the annual signal appears to be maintained to great depths.This data series may therefore potentially be used for extending the manually counted GICC05 chronology further back in time.However, for the current purpose of demonstrating the integrity of the algorithm, it has been run over data sections where manual annual layer counting has already been carried out.
The algorithm shows high skill in reproducing the manually counted timescale within a data section from a cold climate period, during which it deviates just 3.7 % from the manually counted timescale.During warm climate regimes, the annual layer signal in the visual stratigraphy is less apparent.Consequently, the algorithm was less confident in identifying the annual layers here, but it still obtained an annual layer count within 6.6 % of the manual layer count.The algorithm was also run over a short data section Introduction

Conclusions References
Tables Figures

Back Close
Full   , d ) .
In the update equations of the remaining model parameters, the following notation is introduced: the set of maximum likelihood parameter values obtained during the kth iteration is denoted θ (k) .A proposed layer segment is denoted O j , and is a vector consisting of both the line-scan intensity data as well as its derivative.The design matrix X is correspondingly appended to contain not only the applied basis functions used for explaining the annual layer signal in the line-scan data, but also the derivative of these.The residuals of layer segment O j relative to the layer template are collected in the vector E j .The difference in variance of the white noise component in the two data series is given by the 2d × 2d diagonal matrix W, with d being the duration of a layer.The vector r j is the random component previously introduced in Eq. ( 2).An updated value of the mean annual layer signal parameter (ϕ), the random component covariance matrix (Φ), and the variance of the added white noise component (σ Introduction

Conclusions References
Tables Figures

Back Close
Full  Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | distributed.Mean values (ϕ) and covariance (Φ) of these distributions are regarded as model parameters and are included in θ, along with the variance (σ 2 ε ) of the additive Gaussian white noise component, ε j ∼ N (0, Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | individual layer signals, leading to fewer ambiguous layers being counted as annual layers.
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | j , d )X T W −1 (O j − XE[r j |O j , θ (k) ,d η t (j , d )E[E T j W −1 E j |O j , θ (k) ] 2 t,j ,d η t (j , d )d ,with the expectation values in the equations above given as:E[r j r T j |O j , θ (k) ] = E[r j |O j , θ (k) ]E[r j |O j , θ (k) ] T + cov[r j |O j , θ (k) ] E[E T j W −1 E j |O j , θ (k) ] = (O j − X( φ + E[r j |O j , θ (k) ])) T W −1 (O j − X( φ + E[r j |O j , θ (k) ])) + tr(X T W −1 Xcov[r j |O j , θ (k) ]).See Winstrup (2011) for a derivation and more detailed review of the above equations.Discussion Paper | Discussion Paper | Discussion Paper | Copenhagen.It is being supported by funding agencies in Denmark (SNF), Belgium (FNRS-CFB), France (IFRTP and INSU/CNRS), Germany (AWI), Iceland (RannIs), Japan (MEXT), Sweden (SPRS), Switzerland (SNF) and the United States of America (NSF).The work is also a contribution to the Copenhagen Ice Core Dating Initiative that is supported by a grant from the Carlsberg Foundation.Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 1 .
Fig.1.Illustration of the two-layer structure of the Hidden Semi-Markov Model used for annual layer detection.The states of the system are "layer numbers", which have been endowed with a layer thickness parameter d .Changes in state are described by state transition probabilities.Each state emits a segment of observations corresponding to one layer.In this case, the observations are grey-tone intensity values based on line-scan images from the NGRIP ice core (bottom).Based on the entire sequence of observations, the HMM algorithm can be used to infer the most likely state sequence (i.e. the depth-age relationship) along with its corresponding uncertainty.

Fig. 2 .
Fig. 2. (A) Section of line-scan data with manually counted annual layers indicated by alternating bright and dark grey bands.White dashed layer boundaries are regarded as uncertain.Prior to analysis, data were treated by taking the logarithm of the grey-tone intensity values and normalizing according to minimum/maximum values.(B) Mean annual layer trajectory and (C) the first three principal components of the deviation from the mean trajectory during the cold period (depth: 2225-2240 m).4th order polynomial fits to the mean and principal components are shown as dashed lines.

Table 1 .
Results from automated and manual annual layer counts within the three sections in consideration.For the automated counting, the 95 % confidence intervals (numbers in parentheses) are given based on the derived annual layer probability distributions.Note that these distributions are not necessarily symmetric around the most likely annual layer count.For the manual counting, the 95 % uncertainty estimate is based on the Maximum Counting Error (MCE) from the GICC05 chronology.