Testing the consistency between changes in simulated climate and Alpine glacier length over the past millennium

It is standard to compare climate model results covering the past millennium and reconstructions based on various archives in order to test the ability of models to reproduce the observed climate variability. Up to now, glacier length fluctuations have not been used systematically in this framework even though they offer information on multidecadal to centennial variations complementary to other records. One reason is that glacier length depends on several complex factors and so cannot be directly linked to the simulated climate. However, climate model skill can be measured by comparing the glacier length computed by a glacier model driven by simulated temperature and precipitation to observed glacier length variations. This is done here using the version 1.0 of the Open Global Glacier Model (OGGM) forced by fields derived from a range of simulations performed with global climate models over the past millennium. The glacier model is applied to a set of Alpine glaciers for which observations cover at least the 20th century. The observed glacier length fluctuations are generally well within the range of the simulations driven by the various climate model results, showing a general consistency with this ensemble of simulations. Sensitivity experiments indicate that the results are much more sensitive to the simulated climate than to OGGM parameters. This confirms that the simulations of glacier length can be used to evaluate the climate model performance, in particular the simulated summer temperatures that largely control the glacier changes in our region of interest. Simulated glacier length is strongly influenced by the internal variability in the system, putting limitations on the model–data comparison for some variables like the trends over the 20th century in the Alps. Nevertheless, comparison of glacier length fluctuations on longer timescales, for instance between the 18th century and the late 20th century, appear less influenced by the natural variability and indicate clear differences in the behaviour of the various climate models.


Introduction
As it offers a longer perspective compared to the so-called instrumental period (from roughly 1850 CE to present), the past millennium is a key period to study decadal to centennial climate variations.The syntheses of the available climate records indicate a general temperature decrease from the beginning of the second millennium to the beginning of the 19th century, followed by a large warming over the 20th century (Jones et al., 2009;Mann et al., 2009;PAGES 2k Consortium, 2013, 2017;Neukom et al., 2014).Nevertheless, the spatio-temporal structure of the temperature changes is complex, with warm and cold periods being generally not synchronous between different regions (PAGES 2k Consortium, 2013).Those conclusions are in overall agreement with the results derived from global climate models driven by estimates of natural and anthropogenic forcings, although mod-els tend to underestimate the magnitude of the changes in some regions and to simulate more homogenous changes than in the reconstructions (Goosse et al., 2005;Raibble et al., 2006;Gonzalez-Rouco et al., 2006;Jungclaus et al., 2010;Phipps et al., 2013;Fernández-Donado et al., 2013;Landrum et al., 2013;Neukom et al., 2014;Moberg et al., 2015;PAGES2k-PMIP, 2015;Otto-Bliesner et al., 2016).
The data syntheses covering the past millennium are based on many different archives such as trees, corals, glacier ice, lake sediments, pollen, speleothems and marine sediments.They generally do not include glacier length fluctuations, although the latter can be used for independent tests of reconstructed changes (Guiot et al., 2010;Luterbacher et al., 2016).Glaciers are complex recorders of past conditions.Their fluctuations depend on the surface mass balance, which is influenced by several factors, including temperature, precipitation and incoming radiation changes over the glacier, as well as by the glacier dynamics and thus local geometry (Oerlemans, 2001;Huss et al., 2008;Roe, 2011).Furthermore, because of their long response time, glaciers integrate forcing over periods ranging from a few years to several decades or even centuries (e.g.Jòhannesson et al., 1989;Leysinger Vieli and Gudmundsson, 2004).Consequently, glacier length fluctuations cannot be directly compared to records with a much faster response or simply included in multi-proxy reconstructions of past climate changes (Oerlemans, 2005;Roe, 2011;Solomina et al., 2016;Roe et al., 2017).
Despite these difficulties, it is possible to estimate the temperature and precipitation variations that were at the origin of glacier length fluctuations (Mackintosh et al., 2017).One method is to drive a glacier model with a range of climate conditions to determine the ones that are compatible with the glacier length records (Allison and Kruss, 1977;Oerlemans, 1986;Jomelli et al., 2011;Leclercq et al., 2012;Luthi, 2014;Malone et al., 2015;Sagredo et al., 2017;Zechetto et al., 2017;Doughty et al., 2017).The temperature and precipitation reconstructions deduced from glacier length fluctuations can also be compared to estimates obtained from other records and climate model results to test the compatibility between the different sources of information.At a large scale, temperature reconstructions have been obtained using simple glacier models in inverse mode (Oerlemans, 2005;Leclercq and Oerlemans, 2012), assuming that the selected glaciers are mainly influenced by temperature.However, the inversion required to obtain a temperature or a precipitation reconstruction from observations can be ill-conditioned if the record is influenced by several environmental factors, as is the case for glacier length.It thus might be very difficult to disentangle, for instance, the contribution of changes in precipitation and temperature, leading to large uncertainties or biases in the reconstructed signal (Evans et al., 2013;Leclercq and Oerlemans, 2012;Mackintosh et al., 2017).
An alternative method is to drive a glacier model directly with climate model results and compare the simulated length with the observed one.A similar approach, in which a proxy system model has been applied to simulate directly the observed quantity, has been successfully applied to a wide variety of variables such as tree ring widths, coral or speleothem composition (Evans et al., 2013;Dee et al., 2015).The advantages are that the comparison is made on exactly the same variable for models and observations and that the problems related to an inversion are avoided.Until now, comparisons of climate model results with glacier length over the past millennium and the Holocene have been rare and the few existing studies were focused on a small number of glaciers (Weber and Oerlemans, 2003;Leclerq et al., 2012).This limits the ability to assess the climate model performance from glacier length records and the analysis of the origin of observed glacier changes using climate model results.
In addition to the simulated climate, the quality of the comparison between modelled and observed glacier lengths depends on several factors that need to be addressed.First, glacier models have their own limitations (Huss and Hock, 2015;Maussion et al., 2018) and some of the disagreements between simulated results and observations might be attributed to the glacier model and its initial/boundary conditions (e.g.Farinotti et al., 2017) rather than to the climate model.An additional source of uncertainty is related to the internal variability in the climate, which can be dominant at a regional scale for the past millennium (Goosse et al., 2005(Goosse et al., , 2012a;;Jungclaus et al., 2010;Otto-Bliesner et al., 2016).As the climate fluctuations are integrated by the glaciers, this induces glaciers length changes reaching potentially several hundreds of metres (Oerlemans, 2000;Roe et al., 2009;Roe and O'Neal, 2009;Barth et al., 2017).
Our goal here is to perform a systematic evaluation of climate model behaviour by using the outputs of simulations covering the past millennium to force a global glacier model (Maussion et al., 2018).The main objective is to provide a new validation procedure for climate models complementary to the existing ones.Specifically, we will estimate the compatibility of the simulated multi-decadal to centennial-scale climate variability with glacier length records, analysing the links between glacier fluctuations and temperature changes.This implies an estimation of the sources of uncertainty associated with glacier modelling and of the contribution of internal variability to simulated changes.Additionally, the comparison will provide a test of our ability to reproduce past glacier variations using tools that are similar to the ones applied to estimate future changes in glaciers and their contribution to sea level rise (e.g.Marzeion et al., 2012Marzeion et al., , 2018;;Gregory et al., 2013;Bliss et al., 2014;Huss and Hock, 2015;Slangen et al., 2016).The initial focus here is on European glaciers and more specifically on the Alps because of the availability of records that are long enough for our analyses.
The climate model results, the glacier model and the glacier length observations are described in Sect. 2. The results of the glacier model driven by a range of climate models are compared with observations in Sect.3.This includes a discussion of the contribution of internal variability to glacier fluctuations and its impact on the conclusion of a model-data comparison.The sensitivity of the results to key parameters of the glacier model and to the experimental set-up are discussed in Sect. 4. Final conclusions are proposed in Sect. 5.

Climate model results
The climate variables used to drive the glacier model are derived from simulations following the Past Model Intercomparison Project (PMIP3) and the Coupled Model Intercomparison Project (CMIP5) protocols (Schmidt et al., 2011;Taylor et al., 2012;PAGES2k-PMIP3, 2015).1).Some of these simulations do not transition continuously in 1850 from the experiments referred to as "past1000" in CMIP/PMIP nomenclature (years 851-1850) to the "historical" (years 1851-2005) simulations.Because of this discontinuity associated with the experiment design, a jump can be present on the simulated variables in 1850, but it is relatively small for the selected experiments so they can be merged with a limited impact on the results.Those simulations are driven by natural (orbital, solar, volcanic) and anthropogenic (greenhouse gas, ozone, aerosol, land use) forcings (Schmidt et al., 2011(Schmidt et al., , 2012)).Nevertheless, the simulations performed with BCC-CSM1-1 and IPSL-CM5A-LR (for model abbre-viations, please see Table 1) do not include land-use forcing.
Additionally, the aerosol forcing is not activated in the IPSL-CM5A-LR simulation.One simulation for CCSM4, GISS-E2-R, IPSL-CM5A-LR, MPI-ESM-P and BCC-CSM1-1 and an ensemble of 10 simulations with CESM1 are used here.More details about the simulations and the forcing applied in each of them can be obtained in Klein et al. (2016) and PAGES2k-PMIP3 (2015).

The Open Global Glacier Model
The Open Global Glacier Model (OGGM; Maussion et al., 2018) is an open-source model that simulates the evolution of individual glaciers, explicitly accounting for glacier geometry, even in complex configurations involving contributory branches.The first step is to describe the glacier outlines and topography from global public databases: the RGI version 5 (RGI Consortium, 2015) and SRTM topography data version 4 (Jarvis et al., 2008).The glacier main branches, tributaries and flow lines are then defined, and the glacier ice thickness is estimated by solving the equations of ice flow and mass conservation along the flow line.
The mass balance is computed from the equation (Marzeion et al., 2012): where m i (z) is the mass balance of month i at the altitude z.P solid i (z) is the monthly solid precipitation, and T i (z) is the monthly mean temperature.The amount of solid precipitation is derived from the total amount of precipitation assuming that precipitation is entirely solid below 0 • C, entirely liquid above 2 • C and the fraction of solid precipitation varies linearly with temperature between those two values.p f is a correction factor included to take into account the larger precipitation over the glaciers than in the surrounding terrain and at lower altitudes where observations are available.Its value is constant for all the glaciers and is taken to be equal to 2.5 (e.g.Giesen and Oerlemans, 2012).Melting occurs if monthly temperature is above T melt , which is equal to −1 • C in OGGM, as melting may occur some days even though the monthly mean is below 0 • C.This value has been selected on the basis of a cross-validation procedure similar to the one conducted by Marzeion et al. (2012).µ * is the temperature sensitivity parameter, and ε a residual bias.µ * and ε are estimated first for glaciers where mass balance observations are available and then extrapolated to the other glaciers following a procedure described in Marzeion et al. (2012) and Maussion et al. (2018).
The ice dynamics is based on the shallow-ice approximation and is computed along the flow line.In the shallowice approximation, the vertical variations in ice flow are neglected and only a depth-integrated ice velocity is computed.This is a common approximation for computationally efficient ice flow models, and it is largely valid as long as the considered horizontal scales are much larger than the vertical scales (Hutter, 1981(Hutter, , 1983)).Basal sliding is also neglected here.For the prefrontal areas, the direction and path of ice flow are obtained by computing the route from the glacier tongue toward the end of the domain that is the least costly in terms of positive altitudinal change.The flow line therefore follows the valley as a river would do.Along this flow line, we estimate the shape of the bed by fitting a parabola to the intersection points between the actual topography and the normal to the flow line.A main parameter of the model is the creep parameter A. A low value of A corresponds to stiff ice, low velocities and generally a higher ice volume while a high value of A is associated with softer ice and leads to a faster flow and lower ice volumes.The standard value of A selected in OGGM is constant for all glaciers and is set equal to 2.4×10 −24 s −1 Pa −3 , while in reality A may change by a factor of 10 between glaciers due to a wide range of processes (Cuffey and Paterson, 2010).
One advantage of OGGM is that it can be applied to any glacier.It does not require any specific detailed information that would be lacking for the majority of them.Besides, it includes simplifications compared to models focused on a particular, well-observed glacier (e.g.Zekollari et al., 2014) and is therefore computationally efficient.
The climate model outputs required to drive OGGM are the monthly mean temperature and precipitation.The local temperature is obtained by assuming a constant lapse rate of 6.5 K km −1 .To take into account the biases of the climate model, a simple correction procedure is applied: the model results are adjusted to have the same climatological monthly mean values over the reference period 1900-2000 as in the Climatic Research Unit (CRU) data set (New et al., 2002;Harris et al., 2014) used in the standard version of the model (Maussion et al., 2018).The simulations cover the period 850-2005, corresponding to the past1000 and historical simulations in the PMIP3/CMIP5 protocol.However, the sensitivity tests that we have performed have shown that the first century is influenced by the choice of initial conditions for the selected glaciers.Consequently, we will only present the results after 1000 CE.

Glacier length observations
Glacier surface mass balance is the variable that is the most directly related to climate, but only a few, generally short, records are available (Zemp et al., 2009).The number and duration of glacier length observations are much greater (Oerlemans, 2005;Leclercq et al., 2014;Zemp et al., 2015;Solomina et al., 2016).The most accurate estimates are deduced from direct observations of the glacier terminus position as recorded for instance by the World Glacier Monitoring Service in the Fluctuations of Glaciers (WGMS, 2017).The modern observations can be complemented by historical sources including old maps, painting, drawing and early photographs as well as written documents (Grove, 2004;Nussbaumer and Zumbühl, 2012;Purdie et al., 2014;Zumbühl and Nussbaumer, 2018).Additional evidence is obtained by dating the position of moraines indicating the position of the glacier at specific times or from the trees that have been overridden by the advance of a glacier (Masiokas et al., 2009;Ivy-Ochs et al., 2009;Wiles et al., 2011;Schimmelpfennig et al., 2014;Le Roy et al., 2015;Moran et al., 2017).
As the comparison of model results with observational estimates is a key element of our methodology, we have applied OGGM to 71 glaciers from the European Alps that have records covering at least the 20th century in the global compilation of Leclercq et al. (2014).Twelve of these glacier length series go back to 1800 CE, seven go back to 1700 CE, and the longest record starts in 1535 CE (Unterer Grindelwald), allowing for each of them a quantitative comparison with model results at multi-decadal to centennial timescales.The complete list of glaciers is provided in the Supplement (Table S1).Longer records are also available for many glaciers, but these are discontinuous and more uncertain, as reviewed in Solomina et al. (2016).Some of these records will be used for a qualitative evaluation of our results.

Simulated glacier changes
Enhanced winter precipitation has been suggested to be an important contributor to some past changes in glacier length in the Alps (e.g.Vincent et al., 2005;Steiner et al., 2005Steiner et al., , 2008)).Nevertheless, summer temperature is generally considered as the major driver of European glacier fluctuations at centennial timescales (Oerlemans, 2001;Steiner et al., 2005;Huss et al., 2008;Steiner et al., 2008;Leclerq and Oerlemans, 2012;Zekollari et al., 2014).It is thus instructive to first compare the simulated temperatures with reconstructions before analysing the glacier themselves.Europe is probably the continent where the density of records of past temperature changes is the highest and several large-scale reconstructions are available (Luterbacher et al., 2004;Guiot et al., 2010;Pages2k Consortium, 2013;Luterbacher et al., 2016).For simplicity, we will only discuss here the most recent spatial reconstruction of summer temperature, which is highly correlated with long thermometer observations and the majority of individual records (see Luterbacher et al., 2016 for more details).
In agreement with previous studies (Raible et al., 2006;Hegerl et al., 2011;Goosse et al., 2012b;PAGES2k-PMIP3, 2015;Luterbacher et al., 2016), most models are able to reproduce the relatively warm conditions observed at a continental scale during the first centuries of the millennium, the cold conditions around 1600-1800 and the large warming of the 20th century (Fig. 1).However, they underestimate the magnitude of the changes for some (multi-)decadalscale events compared to the reconstruction of Luterbacher et al. (2016).Interestingly, some models display an industrialera warming that occurred earlier or later than observed (Abram et al., 2016), with a potentially large impact on the glacier retreat over the recent period.At a regional scale for the Alps, the conclusions are similar except that the internal climate variability becomes large enough so that simulation results cover nearly the full range provided by the reconstruction, even for the decadal-scale warm or cold events.
The comparison between OGGM results driven by the various climate models and observations leads to contrasting results for individual glaciers (Supplement Fig. S1).This was expected as we have specifically not modified or adapted the parameters in order to apply strictly the standard configuration of the model in this first set of simulations.Nevertheless, for the large majority of the glaciers, the observed length changes are well within the range simulated by the model.For some others, all the simulations overestimate or underestimate the trends over the 20th century or the variability in the pre-industrial period.This is illustrated in Fig. 2 for five well-known glaciers, but a similar behaviour is seen for many others (see Supplement Fig. 1).In these examples, the models tend to underestimate the retreat of the Unterer Grindelwald and Mer de Glace during the 19th century but some of them have a larger retreat than observed for those two glaciers over the 20th century.The agreement is better for the Hintereis, Great Aletsch and Bossons glaciers although for the latter most models overestimate the magnitude of the changes compared to observations.
A detailed comparison between simulations and observed results for each glacier is beyond the scope of the present study as differences may have their origin in the specific characteristics of the glacier such as its stiffness or the presence of debris, in the links between the local climate and large-scale changes, and in uncertainties in the calibration of the climate sensitivity parameter of OGGM, etc.However, a behaviour common to a large majority of the glaciers can be associated with a particular climate model and can the Alpine region (defined here as the area between 45 and 48 • N and between 6 and 13 • E) in the reconstruction of Luterbacher et al. (2016) and as simulated by climate models over the past millennium.The shaded area represents the mean plus and minus 1 standard deviation of the CESM1 model ensemble.A 15-year Lowess smoothing has been applied to the time series.The reference period is the years 1500-1850 CE as in Luterbacher et al. (2016).be described by simply calculating the mean changes over all the glaciers.Conclusions are qualitatively similar for the mean of absolute changes (Fig. 3a) and the mean of relative changes (Fig. 3b).For these latter diagnostics, the glacier length changes are normalized using their observed length in 1950 before calculating the average.This implies that the absolute mean is not dominated by the long glaciers with large fluctuations but reflects a general signal present in the majority of glaciers.
For some climate models (such as the IPSL model), OGGM simulates a relatively stable mean glacier length in the pre-industrial period.When driven by the other climate model outputs, the growth trend between 1000 and 1850 is larger, in particular for the GISS model, CESM and CCSM4.This is followed by a large retreat starting in the 19th century, except in CESM for which the melting begins in the 20th century for nearly all members.
Visually, the difference between simulated glacier lengths (Fig. 3) appears to be much larger than for the temperature (Fig. 1), suggesting that glacier length provides a clear constraint on climate model behaviour.However, part of it may be related to the way the figure is presented.In particular, using a reference period in the 20th century, as required because of the short duration of the glacier records, tends to amplify the differences in the pre-industrial period compared to the classical reference period chosen for temperature (Fig. 1).This is illustrated in Supplement Fig. 2 in which temperature  series have been plotted with a reference period in the 20th century.
Additionally, some of the differences between the simulated glacier lengths may be due to the integration of the internal climate variability by the glaciers and not to a systematic difference between climate models.This impact of internal variability can be quantified from the ensemble of simulations performed with CESM.We have to be careful since this estimate is derived from one model only, which displays significant differences to some of the other models for the Alps.Nevertheless, this provides a first-order estimate.The glacier retreat over the 20th century varies strongly between CESM ensemble members, with the observed changes in the lower range of the ensemble (Fig. 4a).Consequently, although the magnitude of the changes varies considerably between simulations, it is impossible to reject firmly the hypothesis that the differences between climate models and between models and observations for the Alps over this period are due to internal climate variability only.
The signal is clearer when comparing the late 20th century with the years 1700-1850 (Fig. 4b), which roughly corresponds to the maximum extent in the simulated results.All the simulations driven by CESM underestimate the observed changes between those two periods, as the simulated glacier retreat starts much later than in the observations (Fig. 3).The simulations using the standard version of OGGM driven by the other climate models are at the margin or out of the CESM ensemble range, suggesting that the difference are not only due to internal climate variability but are related to different characteristics of the simulations performed with the various climate models.Those simulated results are closer to observations, in particular the ones driven by CCSM4, IPSL and BCC (Beijing Climate Center) model results.
Computing the difference between the years 1000 and 1150 CE (Fig. 4c), when the glacier extent was close to its minimum in nearly all the simulations, and the years 1700-1850 CE confirms the differences deduced qualitatively from Fig. 3. Some models have a large positive growth trend over the pre-industrial period while some others have a much smaller one, with potentially a very large contribution of internal variability.The comparison between the late 20th century and the beginning of the millennium also reveals some clear differences between the simulations (Fig. 4d).For some of them, as the ones driven by the IPSL and MPI models, the minimum is clearly reached in the late 20th century while many glaciers were smaller during the period 1000-1150 CE in the simulations driven by CESM and GISS outputs.It is difficult to estimate from observations when glaciers were smaller than presently as the evidence may still be buried under the ice (Goehring et al., 2011;Luthi et al., 2014;Solomina et al., 2016).For the Alps, this might have occurred before 1000 CE or in the periods 1200-1280 and 1400-1550 CE, but there is currently no direct evidence that this was actually the case during the past millennium (Luthi et al., 2014).
Another instructive diagnostic is the proportion of glaciers that are advancing over a specific period (Fig. 5), since it can potentially be compared to observations (e.g.Solomina et al., 2016).However, this diagnostic is by construction noisier than the glacier length itself and is strongly influenced by internal variability, with the simulations driven by CESM covering nearly the full range between 0 % and 100 % of advancing glaciers for several periods.Estimates derived from observations also display uncertainties.The evidence for a glacier advance, as derived for instance from a moraine position, may actually correspond to a time where the glacier is close to a maximum extent rather than still advancing (Grove, 2004;Solomina et al., 2016).The absence of evidence of advance may also only be due to the lack of a preserved signal in geomorphological features, not to the glacier changes themselves.The model-data comparison can thus only be qualitative and must be interpreted with caution.
As described in the synthesis of Solomina et al. (2016) for the Alps, many glaciers display a minimum extent around the 9th-11th century.This is followed by a first advance in the 12th century, a retreat at the beginning of the 13th century and a general advance in the late 13th century (Holzhauser et al., 2005;Luthi et al., 2014;Le Roy et al., 2015).This advance after the 11th century is in general agreement with our results except that the majority of models simulates an increase in glacier length for the beginning of the 13th century too, while the 12th century is generally characterized by a  Leclercq et al. (2014).The average for observations is calculated over the available time series for each period, meaning that the number strongly decreases with time and, in particular, is very low before 1700.The reference period is 1901-1930 CE.
small number of advances.This would suggest a wrong timing of the glacier advances in models and would be consistent with the higher simulated European temperatures compared to the reconstruction of Luterbacher et al. (2016) around 1100 CE and the lower simulated values compared to the reconstructed ones around 1200 CE.Nevertheless, the variability in the simulated results is too large to obtain a clear answer from the diagnostics of glacier advances alone.Subsequently, observational evidence indicates a retreat around 1400 CE before new advances in the late 15th century and the 16th century, their timing varying between regions (Holzhauser et al., 2005;Schimmelpfennig et al., 2014;Luthi et al., 2014;Le Roy et al., 2015;Solomina et al., 2016).The early 15th century is also a period with glacier retreats in models, preceding major advances in good agreement with observations.The variability between models is larger for the years 1500-1850, when the extent was close to its maximum, and no clear common signal can be deduced from the diagnostics of glacier advances in the simulations for this period.
The early 15th century corresponds to a minimum for glacier advances in many models (Fig. 5) and a relative minimum in glacier length (Fig. 3).Although the simulated tem-peratures are generally mild during this period, they are not high and, in particular, are generally lower than at the beginning of the millennium (Fig. 1).This clearly illustrates the impact of the long response timescales of glaciers.The simulated glacier retreats in the early 15th century appear to be partly due to the temperatures at that time but also to the recovery from the large advances in the 13th and 14th century.
The warming over the 20th century has a clear impact on glacier length, inducing a simulated retreat of nearly all the glaciers in agreement with observations, except in some experiments driven by CESM members that display a weak temperature increase over the Alps (Fig. 3 and Supplement Fig. S1).Nevertheless, the contemporaneous temperature does not appear to be the only variable driving the glacier length changes when comparing two 30-year period at the beginning and the end of the 20th century (Fig. 6a).The contribution of temperature is present, but the response time of the glacier as well as the influence of precipitation variability, for instance, can still obscure the link between temperature and glacier length for those relatively short periods.
The association between summer temperature and glacier changes is more direct and linear when analysing length www.clim-past.net/14/1119/2018/Clim.Past, 14, 1119-1133, 2018 changes on longer timescales.The relative minimum in glacier length in the 12th century (Fig. 3) is clearly due to the warm simulated temperatures at that time (Fig. 1).The climate models that have the largest temperature changes over the pre-industrial period and between pre-industrial period and the 20th century are also the ones that lead to the larger changes in glacier length (Fig. 6b, c, d).This confirms the dominant role of temperature fluctuations in glacier evolution in the Alps (Oerlemans, 2001;Huss et al., 2008;Steiner et al., 2008;Leclerq and Oerlemans, 2012;Zekollari et al., 2014).Furthermore, although some simulations display smaller or larger values compared to observations for each variable, the model ensemble agrees very well with observations for the ratio between tempera-ture and glacier length changes between the pre-industrial period and the 20th century (Fig. 6b).This suggests that the glacier model has a reasonable temperature sensitivity.An alternative interpretation is to state that the link between reconstructed temperatures and glacier length observations is compatible with model results using the standard parameters of OGGM.

Sensitivity of glacier changes to model parameters
The parameter set and experimental design applied in the simulations described in Sect.and 1900-1930;(b) 1970-2000and 1700-1850;(c) 1000-1150and 1700-1850;and (d) 1970and (d) -2000and (d) and 1000and (d) -1150. For (b). For (b), the average of model results is made only for the glaciers that have observations.The crosses represent the individual CESM ensemble members, the ensemble mean being represented by a dot of the same colour.
2018).In order to estimate how our results are sensitive to this choice, a series of sensitivity experiments has been performed, addressing uncertainties in OGGM representation of the glacier dynamics, the surface mass balance and the way climate model results are processed before using them to drive the glacier model.
In the first two experiments, the creep parameter has been multiplied and divided by a factor of 2 for all the glaciers, applying a value of 4.8 × 10 −24 and 1.2 × 10 −24 s −1 Pa −3 .In the next two experiments, the climate sensitivity parameter µ * has been uniformly decreased and increased by 10 %.These experiments are not intended to correspond to a new calibration of these parameters but are used to provide a measure of the impact of a variation in their range of uncertainty (Marzeion et al., 2012;Maussion et al., 2018).
In the standard simulations, a very simple bias correction is applied to climate model results, ensuring that after the adjustment the climate models have the same mean over the reference period as the CRU data set used to calibrate OGGM climate sensitivity parameter (see Sect. 2).However, the variance and the magnitude of the response to a perturbation is likely different at the altitude of the glacier compared to the lower altitude corresponding to the land surface at the scale of the global climate model (Mountain Research Initiative EDW Working Group, 2015; Kotlarski et al., 2015).Consequently, we have scaled simulated temperatures in the final sensitivity experiment so that the variance for each month has the same value as for CRU data set.The temperatures have not been detrended before computing the variance and this thus includes a scaling of the warming over the 20th century as well as of the interannual variability, but the correction is not timescale dependent.This scaling takes into account not only the elevation dependence of the changes but also any bias in the simulated variance (Maraun and Widmann, 2018).
Those changes in parameters have a very large impact on glacier volume, in agreement with previous tests performed with OGGM (Maussion et al., 2018).The differences can reach up to a factor of 2 compared to the standard experiment.They also have a clear impact on the mean length of the glacier.However, when discarding the first 150 years of simulations (when the adjustment to the new parameters occurs), the changes in glacier length averaged over the 71 glaciers are very small.This is illustrated for CCSM4 in Fig. 7. Similar results have been obtained for the other climate models (not shown).In particular, the sensitivity to glacier model parameters and to the correction method applied to climate model results is much smaller than the contribution of internal variability (see Fig. 3), whose role as a dominant source of uncertainty in a model-data comparison is thus confirmed.This conclusion is reached for the Alps and for the selected climate models.Different results might be obtained for other regions or for other models displaying larger biases.Additionally, sensitivity experiments with larger perturbations of parameters would lead to larger differences with the standard experiment.Nevertheless, the small changes in the results of our sensitivity experiments indicate that the main conclusions obtained in Sect. 3 are not critically dependent on the choices made in the application of OGGM.

Conclusions
The simulations performed with OGGM driven by climate model results have shown that there is no inconsistency between the climate provided by the model ensemble and glacier length observations.Disagreements are found for individual glaciers, but this was expected as global models are not able to represent the small-scale processes that may rule some glacier changes.However, when analysing the 71 selected glaciers, there is no systematic bias in the timing or the amplitude of simulated glacier changes and the observed length variations are generally well within the range of simulated values.This agreement was achieved without any specific calibration of the glacier model and does not appear critically dependent on the choice of some model parameters.
This provides an additional positive evaluation of climate models and, by using a new type of data, confirms their ability to reproduce the dominant changes over the past millennium.The successful application of global climate models driving a global glacier model over the past millennium also reinforces the validity of this approach to study future changes on similar timescales.Some studies have argued that the large melting of Alpine glaciers in the 19th century might be due to a modification of the ice albedo caused by the deposition of black carbon of anthropogenic origin (Painter et al., 2013).This hypothesis has recently been challenged (e.g.Luthi, 2014), in particular because no evidence of a significant deposition at the time of the retreat was found in an ice core collected in the Alps (Sigl et al., 2018).Although the simulated changes are underestimated here for some glaciers, this additional forcing does not seem to be required systematically to reproduce past glacier changes in models.
In addition to the overall compatibility of the ensemble of simulations with observations, the comparison between simulated results and estimates of past glacier length fluctuations may help identify some specific characteristics of individual climate model simulations.This comparison is complicated because of the large contribution of internal climate variability on glacier length fluctuations.Nevertheless, some diagnostics appear robust enough to assess the overall climate model skill in the region studied.In particular, some simulations underestimate the amplitude of the glacier changes between the 18th century and the end of the 20th century.This disagreement may have several origins, such as model biases in temperature or precipitation changes.However, an independent comparison between simulated and reconstructed temperatures suggests that these models have too weak a warming over the past 2 to 3 centuries, suggesting an important contribution from this variable in the glacier model behaviour.
Another robust characteristic of many simulations is the timing of the minimum glacier extent over the past millennium.For some climate models, this occurs clearly at the end of the simulation while for some other models the minimum extent takes place at the beginning of the millennium.Unfortunately, observations do not allow the determination of which behaviour is more realistic.Although there are not enough observations in the Alps to argue in favour of a systematic lower extent than today during some periods in the past millennium, the evidence is maybe still hidden below the ice.
More generally, our experiments have demonstrated the interest of driving a global glacier model by climate model outputs in order to have a direct comparison between simulated and observed glacier length.This allows a more quantitative evaluation of the models and a more precise interpretation of the records.For instance, the beginning of the 15th century is characterized by a general glacier retreat in simulations and reconstructions but without particularly high temperatures, illustrating that even though the link between summer temperature and glacier length is strong in the Alps, it is not always straightforward because of the long response time of glaciers.Our results thus open up the application of the same approach to other regions and the integration of glacier records with others in multi-proxy assessments of past climate reconstructions.
modelling" (grant agreement PDR T.0028.18).Hugues Goosse is Research Director within the F.R.S.-FNRS.We would like to thank Olga Solomina for sharing results of the synthesis of glacier records she led and for her suggestions.We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modelling groups (listed in Table 1 of  Edited by: Ed Brook Reviewed by: three anonymous referees

Figure 1 .
Figure 1.Summer temperature averaged over (a) Europe and (b)the Alpine region (defined here as the area between 45 and 48 • N and between 6 and 13 • E) in the reconstruction ofLuterbacher et  al. (2016)  and as simulated by climate models over the past millennium.The shaded area represents the mean plus and minus 1 standard deviation of the CESM1 model ensemble.A 15-year Lowess smoothing has been applied to the time series.The reference period is the years 1500-1850 CE as inLuterbacher et al. (2016).

Figure 2 .
Figure 2. Observed and simulated length for five selected glaciers in the Alps.The shaded area represents the range of the ensemble of simulations driven by CESM outputs.The reference period is 1901-1930 CE.

Figure 3 .
Figure 3. (a) Absolute and (b) relative length changes averaged over the 71 glaciers.The relative length is obtained by dividing the glacier changes by their length in 1950 in the compilation ofLeclercq et al. (2014).The average for observations is calculated over the available time series for each period, meaning that the number strongly decreases with time and, in particular, is very low before 1700.The reference period is1901 -1930 CE.CE.

Figure 4 .Figure 5 .
Figure 4. Mean (black) and median (red) of the difference in glacier length between (a) 1970-2000 and 1900-1930; (b) 1970-2000 and  1700-1850; (c) 1000-1150 and 1700-1850; and (d) 1970-2000 and 1000-1150.In panels (a) and (b), observations are given as a horizontal dashed line.No observation is available for (c) and (d).For (b), the average of model results is made only for the glaciers that have observations.For CESM, the bar gives the ensemble range.

-Figure 6 .
Figure 6.Glacier length changes as a function of summer temperature changes in the Alps for the differences between (a) 1970-2000 and 1900-1930; (b) 1970-2000 and 1700-1850; (c) 1000-1150 and 1700-1850; and (d) 1970-2000 and 1000-1150.For (b), the average of model results is made only for the glaciers that have observations.The crosses represent the individual CESM ensemble members, the ensemble mean being represented by a dot of the same colour.

Figure 7 .
Figure 7. Length changes averaged over the 71 glaciers for the standard and sensitivity experiments using CCSM4 results.
this paper) for producing and making available their model output.For CMIP, the US Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led the development of software infrastructure in partnership with the Global Organization for Earth System Science Portals.

Table 1 .
Climate model simulations used to drive OGGM.