the Creative Commons Attribution 3.0 License. Climate

Abstract. Important progresses have been made in palaeoclimatological studies by using statistical methods. But they are in somewhere limited as they take the present as an absolute reference. This is particularly true for the modern analogue technique. The availability of mechanistic models to simulate the proxies measured in the sediment cores gives now the possibility to relax this constraint. In particular, vegetation models provide outputs comparable to pollen data (assuming that there is a relationship between plant productivity and pollen counts). The input of such models is, among others, climate. The idea behind paleoclimatological reconstructions is then to obtain inputs, given outputs. This procedure, called model inversion, can be achieved with appropriate algorithms in the frame of the Bayesian statistical theory. But we have chosen to present it in an intuitive way, avoiding the mathematics behind it. Starting from a relative simple application, based on an equilibrium BIOME3 model with a single proxy (pollen), the approach has evolved into two directions: (1) by using several proxies measured on the same core (e.g. lake-level status and δ13C) when they are related to a component of the vegetation, and (2) by using a more complex vegetation model, the dynamic vegetation model LPJ-GUESS. Examples presented (most of them being already published) concern Last Glacial Maximum in Europe and Africa, Holocene in a site of the Swiss Jura, an Eemian site in France. The main results are that: (1) pollen alone is not able to provide exhaustive information on precipitation, (2) assuming past CO2 equivalent to modern one may induce biases in climate reconstruction, (3) vegetation models seem to be too much constrained by temperature relative to precipitation in temperate regions. This paper attempts to organise some recent ideas in the palaeoclimatological reconstruction domain and to propose prospectives in that effervescent domain.


Introduction
For a long time, Quaternary palaeoecologists, and in particular palynologists, have used intuitive methods to reconstruct palaeoclimates or paleoenvironments from biological data.We focus here on pollen bioindicator as it has profited, during the last two decades, of progresses of outstanding vegetation models.The most common approach was to compare the present-day distribution of selected species with the corresponding distribution of climate variables thought to be determinant for them, according to the niche's theory.The species are analyzed separately and related to one climatic variable.But the species respond to a combination of climatic variables and their distributions are controlled by different climatic factors in different parts of their ranges.Moreover, climate parameters are often interrelated.Thus, it has been necessary to develop methods taking into account Published by Copernicus Publications on behalf of the European Geosciences Union.
the ecological complexity of species and assemblages, and of their relationships with climatic factors.A relatively ancient evolution was to work with several climatic variables and several species (Iversen, 1944;Atkinson et al., 1987).
One of the power of the pollen counts is to give information partly related on species abundances, making possible to develop response models where the abundance of the species is expressed as a function of the climate (Bartlein and Prentice, 1986).These statistical models are only valid on climatic niches presently realized and their extrapolation to the past could be problematic, when the past is too different from the present (Guiot et al., 2008).Moreover, to reconstruct climate, it is necessary to inverse these response models, but this cannot be achieved directly.Usually one calculates backward statistical relationships between climate and species, which are usually called transfer functions by the palaeoclimatologists.They are based on a few assumptions: (1) Climate is the ultimate cause of changes in the paleobiological data.
(2) The ecological properties of the species considered has not changed between the period analyzed and the present time, and the relationship between the species and the climate is thus uniform through time.
(3) The modern observations contain all the necessary information to interpret the fossil data.
The second and third assumptions originate from the uniformatarian principle that the same scientific laws and processes are constant throughout space and time (this theory has been proposed by James Hutton in 1795 and popularised by Charles Lyell in 1830 "Amid all the revolutions of the globe the economy of nature has been uniform and her laws are the only things that have resisted the general movement").Without the second assumption, the reconstruction of past environments becomes impossible.To satisfy the third assumption, it is necessary to collect a large diversity of modern samples to optimise the chance to cover all the possible situations of the period studied.But sometimes, non climatic forcings are so different today that there is no true modern analogues.An example can easily be given for vegetation.A number of physiological and palaeoecological studies (e.g.Jolly and Haxeltine, 1997;Cowling and Sykes, 1999) have proved that plant-climate interactions are sensitive to the atmospheric CO 2 concentration, and we know, from ice cores (EPICA, 2004), that this concentration is presently much higher than ever during the past 740 000 yr. Consequently modern samples collected under high CO 2 concentration are hardly good analogues for low CO 2 periods.Moreover, pollen assemblages are noisy and sometimes biased records of the climate variables, because (1) pollen productivity is not equal to vegetation productivity, (2) pollen assemblages are disturbed by pollen grain transportation, (3) a pollen taxon is not a univocal species, (4) and the species are not affected by a single climatic variables.
All these problems make difficult the use of statistical methods based on the reference modern data.We synthesise in this paper recent progresses achieved in the last years to relax these constraints.One way was the use of mechanistic vegetation models together with pollen data (a similar approach can followed with other proxies if such models are available).Another complementary way is the use of several proxies measured on the same samples.We will show how to combine both approaches when adequate models are available and what are the perspectives of what is called model inversion.The purpose of this paper is not to detail the mathematics behind the methods, but to give an intuitive flavor of the concepts involved by them.The reader can find in the cited papers more details to satisfy his curiosity.

The methods
Even if the inversion of statistical response models should be the "natural" way to proceed to reconstruction past climate, the large majority of works published in the three last decades were based on a very simple "one-step" concept.A very popular method is the modern analogue technique.

Modern analogues technique (MAT)
MAT is illustrated by the schema of Fig. 1a.This schema does not reflect the exact way in which the algorithm is built, but it facilitates the comparison with the other methods.The caption of the figure explains the five steps.To implement it, it is necessary to define a distance index.Usually a the Euclidian distance of the square-root of the pollen frequencies (chord distance) is used (Overpeck et al., 1985).The number of analogues depends on a threshold above which the similarity is considered as too poor.The reconstructed climate is provided as a weighted mean of the climate of the analogues (according to the inverse of the distance index).It is accompanied by an error bar based on the climatic range of the analogues.This error bar cannot be considered as a confidence interval sensu stricto as it depends on the number of good analogues available and not directly on the tolerance of vegetation to a climatic range nor on the noise in the data.Advantages and limits of the method are discussed in Guiot and DeVernal (2007).

Vegetation modelling
A pollen assemblage (or spectrum) is assumed to reflect the composition and structure of the regional vegetation.It is composed by a large number of taxa which can be grouped into what is usually called plant functional types (PFT, i.e. groups of plant species of similar characteristics and responding in a similar way to climate).This has the advantage  1) a matrix of proxy assemblages and a fossil assemblage (2) which is compared to all the modern proxy assemblages using a distance index; the few most similar ones are identified, they are called the best analogues, (3) they are located on a map, (4) the corresponding climatic variables are selected among the climate database and (5) averaged to provide the reconstructed climate variables.B. IPM starts (1) from a climatic scenario (a vector of climatic variables), randomly generated (R), which is (2) introduced into the proxy model, and produces (3) a simulated proxy assemblage; (4) the fossil assemblage is compared to simulated pollen assemblages; if the matching is acceptable, the climatic scenario is kept, if not acceptable, it is rejected; (5) a new climatic scenario is randomly selected and the procedure (1 to 4) is repeated; (6) when a sufficient number of virtual climatic scenarios is obtained, the procedure is stopped and distribution histograms of the scenarios retained are build.It is possible to change other inputs of the model, such as the CO 2 or insolation, and to study the proxy model sensitivity to that variable.For both approaches, the steps are idealised to facilitate the their intercomparison.The practical algorithms operate generally in slightly different way.
to reduce the size of the assemblages and overall to be coherent with vegetation model outputs, according to the work of Prentice et al. (1996).
There exists a large variety of vegetation models.Some of them need a fine knowledge of climate to estimate vegetation.They are hardly usable at a continental scale where often monthly climatic records are available.This explain why paleostudies have used relative simple biogeochemical models.The most popular model was BIOME3 (Haxeltine and Prentice, 1996) or a modified version BIOME4 (Kaplan et al., 2003).It is a process-based terrestrial biosphere model which includes a photosynthesis scheme that simulates acclimation of plants to changed atmospheric CO 2 by optimisation of nitrogen allocation to foliage and by accounting for the effects of CO 2 on net assimilation, stomatal conductance, leaf area index (LAI) and ecosystem water balance.It assumes that there is no nitrogen limitation.The inputs of the model are soil texture, CO 2 rate, absolute minimum temperature (T min), monthly mean temperature (T ), monthly total precipitation (P ) and monthly mean sunshsine (S), i.e. the ratio between the actual number of hours with sunshine over the potential number (with no clouds).From these in-put variables, the model computes bioclimatic variables, and from them, the maximum sustainable leaf area index and the net primary production (NPP, in kg m −2 yr −1 ) for the PFT's able to live in this input climate.Competition among PFT's is simulated by using the optimal NPP of each PFT as an index of competitiveness.The most important PFT's in Europe are: temperate broadleaved evergreen trees (tbe), temperate summergreen trees (ts), temperate evergreen conifer trees (tc), boreal evergreen trees (bec), boreal deciduous trees (bs), temperate grass (tg), woody desert plant type (wd), tundra shrub type (tus), cold herbaceous type (clg), lichen/forb type (lf).The pollen PFT's are sometimes more precise and pollen information is sufficient to recognize several varieties of the same model PFT, for example pollen is able to separate warm and cool ts.The use of such models in the paleoclimatological context and the simulation of the CO 2 effect on ecosystems are particularly well reviewed in Prentice and Harrison (2009).
BIOME3 and BIOME4 are equilibrium models.LPJ-GUESS is a noticeable improvement as the dynamics of the vegetation stands are taken into account (Smith et al., 2001).While, in the equilibrium models, two runs with the www.clim-past.net/5/571/2009/Clim.Past, 5, 571-583, 2009 same climate gives always the same vegetation output, in a dynamic model, random processes as competition between species, and mortality introduce stochasticity in the outputs.
In LPJ-GUESS, cohorts of trees of different species, age and structure compete for light and soil resources on a number of replicated patches of plants.Either PFT (Sitch et al., 2003) or species (Hickler et al., 2004) may be simulated.Garreta et al. (2009) used the species version which includes 18 species.LPJ-GUESS has standard inputs, i.e. with monthly values of precipitation, temperature and cloudiness.For each study site, past and present, precipitation and temperature chronologies were interpolated from the CRU TS 1.2 dataset (New et al., 2002), which has a spatial resolution of 10 .For cloudiness, they fitted a relationship between monthly cloudiness and both monthly precipitation and temperature per site.

Inversion modelling and Bayesian approach
As indicated by Fig. 1a, the statistical method to estimate climate starts from pollen assemblages and goes back to climate.Vegetation models start from climate and go to vegetation.The idea proposed by Guiot et al. (2000) is then to use massive computation algorithms to "inverse" the model, starting from vegetation and going back to climate.It is not an analytical inversion, but an iterative procedure where one converges progressively towards the climate which has produced the observed vegetation (Fig. 1b).The caption of the figure explains the steps of the method.The climatic space is randomly sampled to produce a large variety of climatic scenarios which are introduced in the vegetation model to simulate the corresponding vegetation composition and productivity.The simulated pollen assemblages are compared to the fossil assemblage and those matching reasonably well are retained.The corresponding climatic scenarios are then considered to be compatible with the observed vegetation.They are used to build histograms, which are estimates of probability distribution functions of a climate able to generate such a vegetation.The outputs of the model do not correspond exactly to the pollen assemblages.A transformation is necessary and it is a major tricky point of the method.Several tested approaches are presented in the following sections.This transformation is assimilated to the model box in the figure.
The second tricky point is the number of input climatic parameters.The used vegetation models use 36 monthly climate inputs above described, which define the scenario.One has to modify them randomly to browse the climatic space, but its size is too high to converge to the true solution.So, we decided to reduce them to a small number of representative variables (Tjan, Tjul, Pjan, Pjul), from which all the other climatic variables are deduced.A sine function is adjusted to the two temperature variables and another to the two precipitation variables, enabling an interpolation of the missing months.The sunshine percentage of each month is estimated by a linear regression from the temperature and precipitation of the same month.See Guiot et al. (2000) for more details.To provide a comparison between sites and time periods, climate variables are expressed as "anomalies" or climate, i.e. differences between proposed climate and the modern climate at the considered site.
The third tricky point is then to define what is considered as a matching.Bayesian theory provides a framework for such a definition (Robert and Casella, 1999).In this context, it uses main concepts of prior and posterior.The prior is the information, summarised under the form of a distribution, which is available prior to the data analysis.The posterior is the information that we will deduce from the data and a hierarchical model.In that respect, the hierarchical model is not restricted to the vegetation model, but it the function which relates the prior to the posterior.In statistical terms, it is the probability of pollen assemblage is the vegetation model which links vegetation V to climate C and p(Y |V ) is the function which links pollen to vegetation.The prior is an initial guess of the probability distribution of the climate.It can be given by the knowledge we have from other paleoclimatic data, or from, if nothing is available, from the knowledge which has been accumulated in that science.The distribution law is then an uniform law defined on that range.
Bayesian statistics have been conceptually introduced in paleoclimatology by Korhola et al. (2002) and Haslett et al. (2006), but without any reference to a mechanistic model.They underlined that such an approach is slow despite making unreasonable compromises on the models employed.With a mechanistic model, it is even slower.The reason is that, to draw the posterior, one has to use Monte-Carlo algorithms which need thousands of iterations.These algorithms -coherently with the Bayesian inference -provide an integration over the climate parameter space instead of an optimisation.A popular type of such algorithms is known as Monte Carlo Markov Chain (MCMC) algorithm.Let us consider a multi-dimensional mathematical space where each dimension represents a climatic variable.A vector of parameters is an element of the multi-dimensional climate space.The Metropolis-Hastings algorithm is an iterative method which browses the climate space according to an acceptancerejection rule (Metropolis et al., 1953;Hastings, 1970).The output of this algorithm is a "path" or "chain" of climate parameters describing the posterior distribution of climate parameter.The MCMC algorithm can be considered as an equilibrium inversion method, compatible with equilibrium vegetation models as BIOME3.
To realise the temporal inversion of the dynamic model LPJ-GUESS, a statistical framework has been developed around a temporal hierarchical model and a Sequential Monte Carlo (SMC or particle filter Doucet et al. (2001)) inference algorithm, because (1) the random character of the vegetation simulated by LPJ-GUESS prohibits the use of the "static" MCMC approach and, (2) the need of dimension reduction in the reconstructed climate space, which is equal to the number of samples in the reconstructed climatic timeseries.

Applications
The method is illustrated starting, in one side from the equilibrium vegetation model BIOME3 towards the dynamic vegetation model LPJ-GUESS and, in the other side, from a single proxy (pollen) towards a double proxy constraint.This is sketched in Fig. 2 with a double axis.The second proxy is, in one case, lake levels data and, in the other one, isotopic data, each proxy giving information on different aspects of the climate.Finally, we will conclude on the directions that palaeoclimatology should follow to fully exploit the increasely diverse and improved set of archives and proxies.
All these results concern annual temperature and precipitation.Even if these variables are not those which are the most determinant for vegetation, they are the average of the input variables and are then the most synthetic.Moreover, annual precipitation is that one which concerns both lake-levels and vegetation.We have thought that it was better to present them instead of more bioclimatic variables (growing degree-days, water availability ...), even if for a better interpretation of the results, it is necessary to look also the bioclimatic variables.

Application A: Europe at the Last Glacial Maximum
The first application uses BIOME3 constrained by pollen data (application A in Fig. 2) for the Last Glacial Maximum (LGM, 21±2 ka BP) in Europe.The data and the method are fully described in Guiot et al. (2000).The model outputs are transformed into pollen PFT's scores by an Artificial Neural Network (ANN) calibrated on a modern dataset (Tarasov et al., 1998).Unlike standard transfer function, the relationship is not calculated between climate and pollen, but between vegetation and pollen, the bridge between climate and vegetation being given by BIOME3.The measure of fit between the vegetation model outputs (NPP) and the observations (pollen PFT scores) is a likelihood index.It assumes a probability model for the simulation "errors", here a Gaussian model.It is then proportional to the sum of square discrepancies between ANN-transformed NPP and observed pollen PFT scores.The priors are given by an uniform distribution law on [−30, +5 • C] for temperature anomalies and [−60, 60%] for precipitation relative anomalies.
A dataset of 15 LGM samples is considered.We present two experiments.The first experiment is done with a high level of CO 2 (340 ppmv) close to the atmospheric concentration existing during the modern data sampling.The second experiment with a low level of CO 2 (200 ppmv), such is measured in the ice cores for the LGM (Petit et al., 1999).The results are presented according to the latitude (Fig. 3).Annual temperature shows an increased gradient from the southernmost site (about 35 • N) towards the northernmost (48 • N), and, for the annual precipitation, a decreased gradient.It is 0.81±0.35• C/ • latitude for temperature and −29±12 mm/ • latitude for precipitation.It means that a temperature decrease larger in the south than in the north is necessary to transform forest into steppes and, in the north, a stronger precipitation decrease is necessary.When the LGM CO 2 level is applied, the gradients become unsignificant for both variables.So the CO 2 lowering is large enough to reduce forest extent: under a high CO 2 level, temperature must fall sufficiently to reduce the growing season under a certain level, and under a low CO 2 level, the forest reduction is due to both temperature lowering and carbon limitation.There is then a real bias in ignoring the true level of CO 2 for climate reconstruction (when statistical methods are used instead of mechanistic models).This bias reaches 3 • C in southernmost sites but not more than 1 • C in nothernmost ones, meaning that CO 2 becomes more limiting than temperature far away from the ice cap.

Application B:
Eurasia and Africa at the LGM Wu et al. (2007a) have improved the method.First BIOME3 has been replaced by BIOME4 (Kaplan et al., 2003).Second, the ANN-relationship between NPP simulations of the model PFT's and pollen PFT scores has been replaced by a correspondence matrix between the model biomes and the biome scores calculated from pollen.This matrix is an empirical result based on modern data and theoretical definition of the biomes (see the original paper for more information).The method has been applied to LGM of Eurasia and Africa (application B in Fig. 2).
The estimated anomalies of the climatic parameters for the LGM period are shown in Fig. 4. The left part of the each graphic concerns Africa.There is a large dispersion which can mainly be explained by a large dispersion of the elevations.Wu et al. (2007a) have shown a strong altitudinal gradient of precipitation.For the modern level of CO 2 , one cannot fit a linear relationship of temperature to latitude in Africa, but yet in Europe, the relationship is negative: high latitude sites had a temperature anomaly of about −12 • C and southern sites anomalies of −10 to −5 • C. The gradient is negative while it was positive in Fig. 3.It is likely due to the better ability of BIOME4 to simulate the LGM vegetation, which is intermediate between cool steppes and tundra.A biome called steppe-tundra was introduced in the most recent version of the model, which fits then much better to the data.Even if that biome does not exist explicitly in the pollen data, it exists cryptically when tundra and steppic scores are of the same magnitude.The reconstructed anomalies under low CO 2 concentration are not significantly different from the reconstructed anomalies under high CO 2 concentration.Wu et al. (2007a) found a clear bias for winter temperature, i.e. q q q q q q q q q q q q q q q 36 38 40 42 44 46 48 −800 −400 0 400 High CO2 (340 ppmv) Latitude Annual Precip q q q q q q qq q q q q q q q 36 38 40 q q q q q q q q q q q q q q q 36 38 40 42 44 46 48 −800 −400 0 400

Low CO2 (200 ppmv)
Latitude Annual Precip q q q q q q q q q q q q q q q  q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −20 0 20 40 −1500 −500 0 500 high CO2 (340 ppmv) Annual precip q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −20 0 20 40 −30 −20 −10 0 Annual temp q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −20 0 20 40 −1500 −500 0 500 low CO2 level (200 ppmv) Annual precip q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q Fig. 4. Annual temperature and precipitation anomalies (i.e.deviations form present value) at the Last Glacial Maximum (21 ka±2 ka BP) in function of latitude.The regions covered are Europe and Africa (42 sites).The grey scale indicates the probability distribution, the blue circles show the mode of the distribution and the red line the linear relationships between these modes and the latitude of the sites, one for sites south of 10 • N and one for sites north of 30 • N.
about 10 • C colder under higher CO 2 , but nothing for summer temperature.The only bias that is found for annual temperature concerns Mediterranean sites, with an annual temperature rather lower under high CO 2 .In Africa, temperature was not very different from present values.Precipitation shows a much more structured profile.Under high CO 2 , anomalies were close to zero in South Africa and between −1000 and 0 mm/yr at equator, depending on the elevation.Under low CO 2 concentration, the reconstruction in the southern part of the continent was similar, and in the central part, the dispersion was higher: −1200 to 0 mm/yr.If we focus on high elevation sites (>1500 m), the precipitation mode, for 340 ppmv, was at about 1000 mm/year and is replaced, with 200 ppmv, by a large double peak from 1100 to 700 mm/year.In fact, Wu et al. (2007b) have shown that the disappearance of forest above 2000 m elevation can be explained partly by a precipitation decrease and partly by a CO 2 lowering.Wu et al. (2007a) analysed the water stress variable α, which is the ratio of actual and potential evapotranspiration and is closely related to the stomatal area and the water use efficiency.They found that its maximum probability ranges within [−40, −28%] for high CO 2 and within [−40, −8%] for low CO 2 .There is then an oversetting of CO 2 , if we use a high CO 2 concentration, inducing an overestimation of the water stress.
Several solutions are possible for the LGM climate in regions where a mixture of steppes and tundra existed.As these biomes have no clear analogues today, a reconstruction based on statistical methods will tend to choose the least poor matching, or fail to find a matching (Peyron et al., 1998;Jost et al., 2005).These analogues were taken in tundra or very cold steppes, resulting in very low reconstructed temperatures.By using a mechanistic model and probability distributions, the results are multi-modal and the most probable mode is different according to the CO 2 concentration.All possible solutions at LGM CO 2 levels can be explored.Complementary proxies are, in this case, of great help to precise the best suitable solution.

Application C: lake levels and an equilibrium vegetation model
The third example is a single site application with a core covering a part of the Holocene and Younger Dryas (YD) for which pollen assemblages and lake-lavels data are available.This application illustrates the effect expected from the use of a second proxy to precise climate components not optimally accessible from pollen data alone (application C in Fig. 2).The palaeo-lake Le Locle (47 • 03 N, 6 • 43 E) has been dried at the last century.It is located at 915 m a.s.l. in the high Swiss Jura.The pollen and lake-level data used in this study were obtained are described in Magny et al. (2001).The lake level status curve indicates that the YD was characterized by a trend toward a lake-level lowering and strong instability (Fig. 5).The early Holocene had three major phases of low levels, before 10 ka BP, between 9 and 8.5 ka BP, and after 7 ka BP.Concerning the vegetation history (represented by the deciduous trees curve and the total of tree pollen (Fig. 5), the Younger Dryas was characterized by rather large percentages of trees (Pinus, Betula) together with about 10% of Artemisia.Early Holocene was characterized by an increase of Corylus, then Ulmus and Quercus.Nothing in the vegetation history can be related to the rise in lake level at ca. 8400-8300 cal yr BP.
First, we use the method as defined in Sect.3.1, pollen being used alone to constrain the model and CO 2 assumed to be constant and equal to the pre-industrial value 280 ppmv.The priors for January and July temperature are assumed to be uniform between −8 and +4 • C (in anomalies) and for precipitation, between −40 and +40% of modern conditions.It is called the "pollen experiment" (ExP) (Fig. 6).The YD was characterised by a temperature lower than present by 8 • C. Annual precipitation did not seem to have any trend across the whole studied period.The second experiment (pollen-CO 2 experiment, ExPC) is obtained by providing to the model the atmopsheric CO 2 as reconstructed from the Taylor Dome ice core (Indermühle et al., 1999) (Fig. 6).It has the largest effect on the reconstruction of temperaturean anomaly of 5 • C instead an anomaly of 8 • C with ExPwhen its concentration is the lowest.This is enlightened by the differences ExPC-ExP between probability distributions of ExPC and that of ExP: the modes of ExP (in blue) are systematically lower than the modes of ExPC (in red).As for the LGM, this shows that, when the true value of CO 2 is not taken into account, there is a bias in the temperature reconstruction, the effect being maximum during the YD, when the CO 2 was the lowest.The effect on precipitation seems to be negligible (the blue and red distributions being flat and not contrasted).
The last experiment (pollen-CO 2 -lakes experiment, Ex-PCL) is obtained by constraining the model with pollen, CO 2 and lake-levels (Fig. 6).The integration of lake-levels is not straightforward.A solution has been proposed by Cheddadi et al. (1996), called the constrained analogue method.The lake-levels were compared, for each iteration, to the precipitation minus evapotranspiration (P−E), closely related to run-off.Both quantities are substracted by their modern value at the study site.we call L, the anomaly of lakelevel and (P−E) the anomaly of P−E.Even if the matching between simulated and observed pollen assemblages are acceptable, the iterations where (| L| ≤ 0.5) and (| (P−E)|>200 mm) ( L>0.5) and ( (P−E)< − 100 mm) ( L< − 0.5) and ( (P−E)>100 mm) (1) are eliminated.The thresholds used in that equation are in some way arbitrary and obtained by trials and errors.Cheddadi et al. (1996) found that the results were not too much Lake Locle (Swiss Jura) Fig. 5. Location of Lake Le Locle in Swiss Jura.The upper right graphic represents the proportion of tree pollen and the proportion of deciduous tree pollen in the pollen diagram.The middle right graphic represents the lake-levels.The lower right graphic represents the CO 2 concentration in the ice core bubbles of Taylor Dome (Indermühle et al., 1999).Time scales are in calibrated years BP.

ExP ExPC ExPCL
ExPC-ExP ExPCL-ExP Fig. 6. Results of three experiments on Lake Le Locle.Two variables are constructed, annual temperature and precipitation.Grey sale indicates the probability distributions.The green curve indicates the modal curves.ExP shows the results when pollen alone is used in the inverse modelling; ExPC, when pollen is used with variable CO 2 concentraton (Indermühle et al., 1999); ExPCL, when pollen is used with variable CO 2 and with the lake levels constraints.The "blue/red" graphic represent the difference between probability distribution of two experiments.The curves represents the modal curves.
sensitive to the choice of these values.Fig. 6 shows that the reconstructed variations of temperature do not change, but those of precipitation follow much better those of the lake levels, with also a decrease of the uncertainties (indicated by a narrowing of the probability distribution).The probability distribution differences (ExPCL-ExP) shows that ExPCL distributions are narrower than the ExP modes (blue areas on both sides of the red area indicate large distributions).So when pollen is used alone, the precipitation reconstruction have much larger uncertainties.These experiments prove again then that CO 2 must be taken into account at least during periods where it is low.Another point is that precipitation, in temperate regions (at least), can not be inferred with a sufficient confidence from vegetation proxies only.Vegetation uses a part of precipitation falling on the ecosystems, a significant part runs off and consequently, a complementary proxy is needed to infer correctly the total amount of water available within the ecosystem.

Application D: δ 13 C proxy and equilibrium vegetation model
We present now a another single site application with a core approximately covering the Eemian warm period (128 to 100 ka BP) for which pollen diagram and δ 13 C of organic matter are available.This application illustrates the effect expected from the use of a second proxy of vegetation to decrease uncertainties of pollen data alone.It corresponds to application D of Fig. 2. The procedure used is based on BIOME4 model (as in Sect.3.2).The likelihood function LH assumes a Gaussian probability distribution for the errors of where subscripts o and s correspond to target and simulated values respectively and where 1/S 2 is the whole model precision, the inverse of the model error variance.It is an adjustable number which measures the quality of the fit between model outputs and data (Hatté and Guiot, 2005).When pollen data are also available, it is possible to use biome assignment to the sample to make an additional selection of the runs.If the simulated biome matches with the biome obtained from pollen, the iteration is kept, if not, it is rejected.Hatté et al. (2009) have compared the results obtained with biome alone (which is a single pollen approach) and with carbon isotopes constrained by pollen biomes.The method is validated in Hatté et al. (2009).
We reproduce here the results obtained for La Grande Pile sequence.This site is located at 47 • 44 N, 6 • 30 E, 330 m a.s.l. with annual precipitation of 1080 mm, a mean annual temperature of 9.5 • C, and a seasonal range of about 18 • C between the warmest and the coldest months.The data are presented in Rousseau et al. (2006).
For each sample of the La Grande Pile core, an input vector is defined and composed by (1) the δ 13 C of the sample, (2) the target biomes as the two with the highest scores achieved by the biomisation procedure (further information in Rousseau et al., 2006), (3) the atmospheric CO 2 concentration based on Petit et al. (1999) record interpolated at La Grande Pile time-scale and (4) soil type and texture.The reconstructed annual temperature and precipitation are based on iteration with value of LH higher than -0.5, corresponding to an accepted error of maximum 0.7 ‰ for δ 13 C.
Mean annual temperature and annual precipitation reconstructed by inverse modelling constrained by both pollen biomes and δ 13 C are bracketed by the ranges which should be obtained by pollen biome constrains alone (Fig. 7).The added-value of double constraints is particularly clear for precipitation reconstruction: single constraint infers a constant value with large uncertainties ([−600, +200 mm/yr] in precipitation anomaly) and a double constraint decrease uncertainty by 2 to 4. Furthermore, reconstructed temperature ranges are often decreased by a factor 2. This confirms the conclusion of previous section that pollen alone cannot give a sufficiently precise reconstruction of precipitation.This shows also that the use of two proxies decrease the uncertainty on reconstruction of both variables and inverse modelling is an elegant way to integrate several proxies related to vegetation.Nevertheless, we must note that the uncertainty provided by pollen biome is higher than uncertainty provided by the whole PFT assemblage, as in the previous subsections.

Application E: dynamic vegetation model
This section intends to illustrate the use of a dynamic vegetation model, LPJ-GUESS, with a single proxy, i.e. pollen assemblages.As the model is dynamic, this application deals with the temporal characteristics of the data, such as already suggested by Haslett et al. (2006).Vegetation is not only assumed to be dependent on the contemporaneous climate but also on the previous vegetation.Autocorrelation in the time-series is considered as an important information.Moreover, the dynamic model is not a deterministic model (two runs with the same inputs do not produce exactly the same results).MCMC algorithm are then not applicable.Garreta et al. (2009) have proposed to use a particle filter technique more adapted to time-series and stochastic processes.We do not use here PFT's scores but a restricted vector of 14 arboreal pollen taxa (Abies, Alnus, Betula, Carpinus, Corylus, Fagus, Fraxinus, Picea, Pinus, evergreen Quercus, deciduous Quercus, Tilia, Ulmus and Populus) and a 15th herbaceous taxon, summing all the herbaceous taxa.This choice has the maximum of coherency with the 18 species defined in LPJ-GUESS.Garreta et al. (2009) applied their method to a fossil core (Meerfelder maar) (Litt et al., 2009), but we just present here the validation of the method with modern samples.The monthly temperature and precipitation were deduced from a 6-dimensional climate parameter vector: C=(T Jan , T Jul , P win , P spr , P sum , P aut ), which is slightly different on what has been done in the previous sections.The first two variables are temperature anomalies (in • C) from January and July for 1901-2000.The four other ones were seasonal (winter, spring, summer and autumn) precipitation relative anomalies (in %).
To simulate vegetation at time t j >t i , with t i and t j consecutive time periods (corresponding to the resolution of the core), the vegetation model starts with V t i and runs for t j −t i years.If t j −t i is short, vegetation simulated at t j is strongly forced by vegetation V t i and then, implicitly, by climate C t i .This constraint gives a time-coherence to vegetation and then to reconstructed climate, and helps to produce "histories" or "dynamics" or joint distributions of vegetation and climate.This constraint can be seen as a smoother of the local bias within independent reconstructions.
A key element of the inversion model is the relationship between simulated vegetation and pollen data.In the previous sections, this has been calculated either with a statistical non linear relationship or with a correspondence matrix.Here it is approximated by a kernel surface (or a response surface) where the pollen taxon is expressed as a function of the taxonomically closest model taxon.This kernel was calibrated on a dataset of 1209 surface samples covering Europe and North Africa.It is illustrated for Alnus (Fig. 8) where the maximum weight is found where the coherency is best between data and model (here, in the region of low pollen and NPP values and in the region of mean pollen values and NPP around 0.02).Where pollen values are high (>4), model is enable to simulate high NPP.
The results of the method are shown as (smoothed) posterior distributions of each climatic variable.It is illustrated for an Andalucian site (Fig. 9).Mean discrepancies between posterior medians and expected values of the 6 reconstructed parameters are negligible by comparison with interval widths: the differences between the modes are <5 • C for temperature and close to 0% for precipitation.There is then a bias for temperature.But, this kind of analysis has not really a sense for a single site.To really evaluate the biases, it is necessary to repeat this validation for several sites.It has Fig. 8. Alnus distribution: the points are the modern sites, the xaxis is the ?transform of pollen percents=log(P alnus /P GrSh ) (P for percent), the y-axis is the simulated Alnus using LPJ-GUESS (in kg carbon m −2 year −1 ) from CRUTS-1.2 (New et al., 2000) climate interpolated at each site; the color scale represents the surface fitted to the density of sites (red meaning maximum density).
been done for 30 sites in Europa and Garreta et al. (2009) have shown that the mean bias was <1 • C and 3% in absolute value.Thus, the method seems to be unbiased.
To provide a valuable information, the posterior distributions must be narrower than the prior ones.It is the case for temperature where the lower limit of temperature distribution goes from −15 • C to −5 • C in January and from −10 • C to −6 • C in July (Fig. 9).Precipitation posteriors are not narrower than their priors, a result which shows that improvements, in both the vegetation model and the inversion scheme, are still necessary.Some of them concern a better modelling of the relationship between pollen dispersion and plant productivity.

The main results relevant to palaeoclimatology
This paper has shown the progresses which have been made in the last ten years by introducing more mechanisms in the climate reconstructions.The hypotheses behind classical approaches say that we may find in the modern world, similarities for the past and then explain the past in one location as a realization of a present situation somewhere else in the world.This is clearly the basis of the analogue approaches, but also of all statistical approaches based on a modern dataset considered as a training dataset (regression based methods, artificial neural networks ...).Mechanistic models, able to simulate a proxy in function of climate, give us the chance to work around this hypothesis at the condition to replace similarity of data by uniformity of processes.It implies that such models were strongly based on mechanisms and not only a set of linear or non linear equations calibrated on modern datasets.Vegetation models is among this category.
Pollen data have the chance that vegetation models based on physiological laws have been developed more than fifteen years ago (Prentice et al., 1992).Having such model available is not the only condition.These models must also be enough simple to work with accessible inputs (climate, soil structure ...).It has been the leading mind of most of the vegetation models developed since this pioneer work.This paper has shown how to go from a relatively simple equilibrium model (BIOME3) to a dynamic model as LPJ-GUESS.These models give the possibility to work under conditions very different from the modern ones.It is clearly the case for the atmospheric CO 2 concentration often lower than the continuously increasing present one (200 ppmv during the glacial periods, around 280 ppmv during the interglacials and more than 370 ppmv today).The seasonality changes are also an interesting point.It is induced by variations of earth orbit around the sun (see the pioneer work of Berger, 1978).This feature is implicitly partly taken into account by the inversion procedure, through its effect on temperature and precipitation, as different priors are set for winter and summer.But solar radiation influences also directly photosynthesis and this should also been taken into account in the future.
Our results enable to draw several important points: 1) there may be a significant bias in not taking into account the difference of CO 2 between modern and past time periods.Particularly, during the glacial periods where the difference is maximum, CO 2 fall is partly responsible of the destruction of forest in Mediterranean area.Not taking it into account, the results tend to attribute it to a too important temperature fall.The tundra-steppe vegetation of central and southern Europe is interpreted as a tundra vegetation when statistical methods are used, while a mechanistic model as BIOME4 interpret it as a cool steppe, less cold than the tundra, especially in summer.Some biases can also exist during less cold periods (Younger Dryas and even Holocene).
2) The use of lake-levels to constrain the reconstruction from pollen data reduces the uncertainty associated with the fact that pollen in temperate zones is a temperature indicator rather a precipitation proxy.The results, using lake Le Locle pollen data and lake levels proxies, have shown that, not only uncertainty is reduced but also larger variations are reconstructed across the Holocene.
3) δ 13 C is another proxy strongly related to precipitation.The results on the Grande Pile Eemian have confirmed that the joined use of pollen and carbon isotopes reduces also the uncertainties on precipitation reconstruction.
4) The use of a dynamics model confirms the main role of temperature in the vegetation shifts in Europe.This approach is still in development and some improvements are necessary to make the method fully operationnal.A first result here, which maybe confirms points 2 and 3 above, is that the effect of precipitation seems to be underestimated in LPJ-GUESS or BIOME4.
www.clim-past.net/5/571/2009/Clim.Past, 5, 571-583, 2009 Vegetation models are an elegant solution to integrate several proxies.They simulate quantities which may be related to pollen data.They simulate also fractionation of δ 13 C in the plant which can be compared with isotopic measurements in the sediment bulk.They simulate also water absorbed by the plant and water running off.The run-off, represented by precipitation minus evapotranspiration, can be directly compared to lake-levels data when the core is lacustrine.As often lake recharge is done in winter and water useful for vegetation must be available in the growing season, the use of both proxies give a complementary enlightening of two complementary aspects of the climate, which enables also to study various seasonalities.Despite these points, this inverse vegetation modelling approach is not the panacea.First, because it is a model-based approach, it is highly dependent on the quality of the proxy model.Second, it requires a great deal of computation time, which will increasingly become a problem in adapting the technique to more sophisticated models.Third, the outputs of the model are not directly comparable with the pollen data without a pollen dispersion modelling.Further verification is required by adapting this approach to other vegetation models.It remains important, however, to use this approach in parallel with classical statistical approaches.The comparison of results is a major key in understanding relationships between paleoclimates and palaeovegetation.
Finally, it is expected a lot in building an integrated model of the pollen accumulation in the core: this model should include all the processes such as vegetation development, pollen dispersion, catchment basin erosion, sediment accumulation, diagenesis, chronology uncertainties...A lot of work is still to be done.
Dedication: André Berger has not only been a pioneer in the theory of paleoclimates.He has stimulated a lot of scientists around him.He early understood the necessity to obtain quantified information on the past climates.Concerning the first author of this paper, he was his PhD supervisor and he pushed him to develop such quantified approaches for terrestrial archives.The first author dedicates this article to an inspiring colleague and friend.

Fig. 2 .
Fig.2.Evolution of the inverse modelling method according to directions: complexification of the proxy model and increased number of proxies.The applications A to E, corresponding to the sub-sections labelled in the text, refer to examples of increased complexity using vegetation models.Applications A/B are equilibrium model inversion constrained by pollen alone (low complexity); applications C and D involves an equilibrium model with two proxies (pollen with lake levels (LL), pollen with δ 13 C); application E involves a dynamic model constrained by pollen alone; the two icones corresponding to prospective represent two cases not yet implemented: a simple equilibrium model with three proxies and a more complex dynamic model with also three proxies.

Fig. 3 .
Fig. 3. Annual temperature and precipitation anomalies (i.e.deviations form present value) at the Last Glacial Maximum (21 ka±2 ka BP) in function of latitude.The region covered is Europe.Two experiments are defined, one with quasi-modern CO 2 concentration (340 ppm), one with LGM CO 2 concentration (200 ppm).The grey scale indicates the probability distribution, the blue circles show the mode of the distribution and the red line the linear relationship between the mode and the latitude.

Fig. 7 .
Fig. 7. Temperature and Precipitation reconstruction at La Grande Pile during Eemian period.Mean annual temperature and annual precipitation reconstructed by biome(s) and δ 13 C inverse modelling are represented by a grey scale color for the probability distribution and its modal curve (in blue).They are bracketed (in green) by the domain that encompasses both potential climatic niches of both most likely biomes.Modified from Hatté et al. (2009).

Fig. 9 .
Fig.9.Verification of the method on modern pollen data of an Andalucian site.Prior (blue lines) and posterior (red lines) univariate distributions of the 6 climate variable weighted with the particle importance and smoothed: monthly January and July temperature (in • C of anomaly), seasonal winter, spring, summer and autumn total precipitation (in % of anomaly).