A model–data comparison of the Holocene global sea surface temperature evolution

. We compare the ocean temperature evolution of the Holocene as simulated by climate models and reconstructed from marine temperature proxies. We use transient simulations from a coupled atmosphere–ocean general circulation model, as well as an ensemble of time slice simulations from the Paleoclimate Modelling Intercomparison Project. The general pattern of sea surface temperature (SST) in the models shows a high-latitude cooling and a low-latitude warming. The proxy dataset comprises a global compilation of marine alkenone- and Mg/Ca-derived SST estimates. Independently of the choice of the climate model, we observe signiﬁcant mismatches between modelled and estimated SST amplitudes in the trends for the last 6000 yr. Alkenone-based SST records show a similar pattern as the simulated annual mean SSTs, but the simulated SST trends underestimate the alkenone-based SST trends by a factor of two to ﬁve. For Mg/Ca, no signiﬁcant relationship between model simulations and proxy reconstructions can be detected. We test if such discrepancies can be caused by too simplistic interpre-tations of the proxy data. We explore whether consideration of different growing seasons and depth habitats of the planktonic organisms used for temperature reconstruction could lead to a better agreement of model results with proxy data on a regional scale. The extent to which temporal shifts in growing season or vertical shifts in depth habitat can reduce model–data misﬁts is determined. We ﬁnd that invoking shifts in the living season and habitat depth can remove some of the model–data discrepancies in SST trends. Regard-less whether such adjustments in the environmental parameters during the Holocene are realistic, they indicate that when modelled temperature trends are set up to allow drastic shifts in the ecological behaviour of planktonic organisms, they do not capture the full range of reconstructed SST trends. Results indicate that modelled and reconstructed temperature trends are to a large degree only qualitatively comparable, thus providing a challenge for the interpretation of proxy data as well as the model sensitivity to orbital forcing.


Introduction
A serious problem of future environmental conditions is how increasing human industrialisation with growing emissions of greenhouse gases will induce a significant impact on the earth's climate. Information beyond the instrumental record covering the last 150 yr can be obtained mainly from two strategies: on the one hand by deriving from proxies which record past climate and environmental conditions, and on the other hand by simulating climate, using comprehensive models of the climate system under appropriate external forcing. Numerical climate models are clearly unequalled in their ability to simulate a broad suite of phenomena in the climate system (Jansen et al., 2007), but their reliability on longer timescales requires additional evaluation. Only climate records derived from palaeoenvironmental proxies enable the test of these models because they provide records of climate variations that have actually occurred in the past. However, well-known uncertainties in the proxyderived palaeoclimate records exist, e.g. age control, signal formation, or calibration issues (Bradley, 1999).

G. Lohmann et al.: A model-data comparison of the Holocene global sea surface temperature evolution
Performing model-data comparisons can help reduce uncertainties in both model simulations and reconstruction of past climate change, and thus provide a test for climate projections as derived from climate models (e.g. Schmidt, 2010). In this perspective, the climate evolution from the mid-Holocene to the pre-industrial (PI) conditions is an ideal test bed for models, as the main forcing for temperature trends (insolation) for this period is known from astronomical theory (Berger, 1978), and a relatively large number of high-resolution and well-dated proxy records are available (e.g. Leduc et al., 2010a). Such records constrain the climate response to changes in external forcing (e.g. Hansen, 2007). However, uncertainties remain regarding important variables, such as temperature responses, the amplitude and feedbacks on long timescales and on large spatial scales (Köhler et al., 2010;Rohling et al., 2012).
There have been several studies focused on model-data comparisons of the mid-Holocene climate evolution devoted to identifying and explaining model-data mismatches. For example, Masson et al. (1999) and Guiot et al. (1999) compare mid-Holocene pollen-and lake-status-based reconstructions of European climate to an ensemble of atmosphere general circulation model (AGCM) climate simulations. They find little coherency among different models in the simulations of European climate change during the Holocene, and conclude that the North Atlantic sea surface temperature (SST) evolution that was not considered in those atmosphereonly simulations may be crucial to adequately simulate European climate evolution. A more recent analysis of Brewer et al. (2007) compare the output of 25 atmosphere-ocean general circulation model (AOGCM) simulations of the mid-Holocene period with a set of palaeoclimate reconstructions based on over 400 fossil pollen sequences, distributed across the European continent. They find better agreement between model results and proxy data, but the models still faced difficulties in capturing the magnitude of climate change. Sundqvist et al. (2010) provide an overview of northern highlatitude temperature change, and find that most proxies were terrestrial, and summer biased. By taking simple arithmetic averages over the available data, the reconstructions indicate that the northern high latitudes were 2 • C warmer in annual mean temperature during the mid-Holocene compared to the recent pre-industrial. This compilation  and modelling studies Zhang et al., 2010) indicate that the strongest warming in the Arctic Ocean realm is in autumn, which is closely related to a delayed sea-ice response to summer insolation. A recent compilation of land proxy data and models (Braconnot et al., 2012) shows mean annual temperature anomalies of 2-5 K during the mid-Holocene over large parts of northern and middle Europe, parts of northern Asia, as well as southern Africa. In the Mediterranean and the subtropical regions, the data shows a cooling of 1-2 K, as seen from pollen and plant macrofossil data (Bartlein et al., 2011).
Previous data compilation based on SST reconstructions during the mid-to-late Holocene mainly focus on large-scale pattern in the North Atlantic realm (Marchal et al., 2002;Rimbu et al., 2003), Pacific-Atlantic teleconnections , linkages between high and low latitudes  and global trends (Lorenz et al., 2006;Leduc et al., 2010a). The set of Holocene SST records that we use here is derived from alkenones and Mg/Ca, two proxies that are commonly used and thus largely applied over the last two decades (e.g. Brassell et al., 1986;Prahl and Wakeham, 1987;Prahl et al., 1988;Rosell-Melé et al., 1995;Nürnberg et al., 1996;Schneider et al., 1996;Conte et al., 1998;Herbert et al., 1998;Müller et al., 1998;Rosenthal et al., 2004;Greaves et al., 2008). Alkenones are synthesized by a small number of Haptophyceae phytoplankton of which the coccolithophorids Emiliania huxleyi and Gephyrocapsa oceanica are the two most common sources in the present oceans and modern sediments. Here, we consider mainly two parameters that might influence estimations of Holocene SST trends: changes in seasonal changes in coccolithophorid production (e.g. Rosell-Melé et al., 1995;Sikes et al., 1997;Ternois et al., 1997;Davis and Brewer, 2009), and changes in their depth habitat (e.g. Ternois et al., 1997;Bentaleb et al., 1999;Ohkouchi et al., 1999). Alkenones record a temperature signal that reflects the surrounding water temperature during the algae's lifetime. This recorded signal can be influenced by species-dependent ecological preferences, hence, the reconstructed temperature signal may depend on the seasonality and depth habitat of the alkenoneproducing organisms (e.g. Müller et al., 1998;Baumann et al., 2000;Andruleit et al., 2003). In a similar way, planktonic foraminifera, which produce tests from which Mg/Ca SST estimates are derived, thrive over wide ranges of seasons and water depths (e.g. Fairbanks et al., 1982;Deuser and Ross, 1989;Mohtadi et al., 2009;Regenberg et al., 2009;Fallet et al., 2010).
Here, we specifically address this issue by presenting a comparison of simulated and reconstructed ocean temperatures for the mid-to-late Holocene (6 to 0 kyr BP -before present). We compare results from an ensemble of transient simulations of the Holocene, performed with the ECHO-G model , to marine alkenone-and Mg/Ca-based temperature reconstructions. Previous studies indicated an agreement in the tendency between marine proxy reconstructions and model simulations of the temperature evolution, but a mismatch with respect to the amplitude of the temperature trends (Lorenz et al., 2006;Schneider et al., 2010). Therefore, it has been speculated that taking into account proxy specificities associated with the ecological behaviour of planktonic organisms from which SST are derived can remove parts of the observed mismatches. For instance, changes in surface water stratification and in seasonality of the planktonic organisms' living season could affect the proxy reconstructions. This may establish a diagnostic of why model-data mismatch is observed.    Here, we use our extended GHOST database (Leduc et al., 2010a), which comprises marine SST proxy records based on alkenones and Mg/Ca (Table 1). We compare these data to ensemble simulations from a transient experiment as well as to a selection of climate model simulations for the mid-Holocene period. We then systematically explore whether the model-data mismatches could be reduced by invoking changes in seasonality or water depth structure within the limit of estimated ecological requirements. By quantifying marine sites where model-data mismatch may potentially be caused by a misinterpretation of the proxy record and by quantifying the potential influence of seasonality and habitat depth on the alkenone-and Mg/Ca-derived temperature, we evaluate possible reasons for the misfit of simulated Holocene SST trends with the proxies.

Data and methods
The marine alkenone-based temperature reconstructions are from the GHOST database . We use an updated version of this database (Leduc et al., 2010a), which comprises marine proxy records for SST based on alkenones and Mg/Ca. The temperature reconstructions used here cover the mid-Holocene (6 kyr BP) to the last millennium (0 to 1 kyr BP, depending on the record), and consist of 52 alkenone-based SSTs (Figs. 1-3) and 19 Mg/Ca-based SST records (Fig. 4). These are unevenly distributed over the world ocean and are mainly located in the North Atlantic Ocean and in coastal areas (Fig. 5). We only consider records that have at least 10 incorporated values. As our main interest is the pattern of SST evolution, we determine the linear temperature trends between 6 and 0 kyr BP at every core location. These temperature trends show the spatial pattern of   Table 1. The inserts provide information about the core location. Boxed inserts indicate cores which fail the residual test for randomness of the proxy linear regressions' residuals as described in Sect. 3.2. temperature evolution since the mid-Holocene, as recorded by the marine temperature proxies. We evaluate the ocean model component of ECHO-G in simulating the seasonal cycle in SST for the core locations (Figs. S1-S4, Supplement).
Simulated temperatures are based on the ensemble mean of two transient experiments spanning 7 to 0 kyr BP, using the ECHO-G model . The model is described in Legutke and Voss (1999). It consists of the atmosphere model ECHAM4 (Roeckner et al., 1996) and the ocean general circulation model HOPE including a dynamical-thermodynamical model for sea ice (Wolff et al., 1997). Only the orbital forcing has been applied in this experiment, and other parameters (e.g. CO 2 ) have been set to pre-industrial (PI) values. Calculation of the orbital parameters follows the orbital solution of Berger (1978) and is accelerated by a factor of ten ). The same model has been applied for the Eemian and glacial inception (Felis et al., 2004;Lohmann and Lorenz, 2007).
The ocean model grid consists of 120 unequally spaced grid cells in latitudinal direction, and 128 equally spaced grid cells in longitudinal direction; the equatorial latitudes between ± 10 • latitude have a resolution of 0.5 • in order to  resolve the equatorial wave guide, and this resolution gradually decreases polewards until 30 • latitude to approximately 2.7 • . As for the proxy reconstruction, we calculate the linear trends of the temperature simulation from the mid-Holocene to the present (6 to 0 kyr BP).
Furthermore, we make the same analysis for the models participating in PMIP3 (Taylor et al., 2012;Braconnot et al., 2012), listed in Table 2.
In the first step of our analysis, we compare the observed proxy-based temperature trends to the simulated temperature trends at the core positions. As the habitat depth and the seasonality of the proxy recorder are not systematically known, we perform this comparison for simulated annual and seasonal mean temperatures and extract the temperature trends at each model level of the upper 100 m water depth of the ECHO-G model. In a second step, we estimate the sensitivity of the observed temperature trends to potential transient changes in the ecological behaviour of planktonic organisms: shifts in seasonality or habitat depth.  For seasonality, we first extract the maximum seasonal temperature trend among the twelve months ( • C day −1 ) from the PI climate simulation for each core position. A lower limit of the seasonal shift that is needed to reconcile model simulation and proxy reconstruction is then calculated by dividing the residual between the simulated and reconstructed Holocene temperature trend by the maximum temperature trend described above. Such procedure only estimates the absolute value of the seasonality shift required to reconcile models and data, but its direction cannot be determined as a result of a lack of knowledge on the seasonality of the planktonic organisms. To make an example, a time shift of 30 days means that a seasonal correlation centred on JJA is then centred on JAS or MJJ. For present conditions, time shifts in the booming season of planktonic organisms are of the order of 15-60 days and can be affected by interannual to decadal temperature and circulation changes (e.g. Lohmann and Wiltshire, 2012). To derive a lower estimate for the shift in habitat depth, we analyse the vertical temperature gradient between the first two levels of the ocean (10 and 20 m) in the PI climate model output at the core positions. We retrieve the shift in habitat depth similarly to the procedure of the time shift calculation, by dividing the difference between the simulated and reconstructed temperature trends by the vertical temperature gradient.

Holocene trends: data and model
We compare the annual mean SST trends from the mid-Holocene to the present as simulated by the ECHO-G model and as estimated from alkenone and Mg/Ca temperature proxies for the same time period (Fig. 5). We find that the general temperature pattern recorded by the alkenones is a warming in the tropics and the North Pacific Ocean. Cooling predominates in mid-and high latitudes of the North Atlantic Ocean and in the Southern Hemisphere midlatitudes. In many regions, such as the North Atlantic Ocean, the Mediterranean Sea, the northern Indian Ocean, and the western North Pacific Ocean, there is a good agreement between the model and alkenone data with respect to the www.clim-past.net/9/1807/2013/  spatial pattern of the temperature trend (Fig. 5). Globally, the alkenone and simulated SST trends are significantly correlated (R = 0.49, p < 0.05). Yet, the amplitudes of recorded and simulated temperature trends often differ, with proxies generally showing larger SST changes during the Holocene. A scatter plot of the modelled SST trends as simulated by the ECHO-G model versus alkenone-based SST trends ( Fig. 6a) shows that only at few locations do the alkenone reconstructions and the model simulations bear comparable temperature trends. The correlation between the modelled and Mg/Ca-based SST trends is negative and not significant (R = −0.31, p > 0.05).
To analyse the potential influence of the seasonality on the model-data comparison, seasonal variations of the simulated monthly temperature trends are shown as vertical bars in Fig. 6. Of the 52 alkenone records, only 22 (∼ 42 % of the total number of records) are in agreement with the model trend at some time during the year (Fig. 6a). Out of the other 30 (∼ 58 %) data markers, 9 (∼ 17 % of the total number of records) show a difference of more than 2 • C.
A similar analysis for Mg/Ca-based SSTs indicates that approximately 53 % of the cores agree with the model simulation at some time during the year (Fig. 6b). Of the 9 (∼ 47 %) data markers that do not match with the model simulation, 2 (∼ 11 %) differ by more than 2 • C (Fig. 6b). Temperature trends are larger in the alkenone (−4 to 2 • C) than in the Mg/Ca (−2 to 2 • C) reconstructions. This might be also caused by the different core positions of both proxies. In our   dataset, alkenone records are more abundant at high latitudes while Mg/Ca records are more abundant in low latitudes. The data with more than 2 • C ( Fig. 6a) are mainly at high latitudes. We expect that the alkenone method has its limitations in these areas (e.g. Conte et al., 1998Conte et al., , 2001Calvo et al., 2002). The magnitude of the modelled SST trends at core locations is, however, limited to the range from −1 to 1 • C, which means that the model underestimates the trends as compared to both the alkenone and Mg/Ca reconstructions.

Residuals and error bars
We test whether a linear model is appropriate to describe the shape of the Holocene trends. The insolation changes are not linear in time, and non-linear reactions of the climate system might additionally cause deviations from a linear evolution with time. In the sediment records, some cores, for example alkenone record BS79-38, show deviations from linearity. This is especially true for alkenone records and less pronounced for the Mg/Ca records. One cannot exclude that this occurs by chance as the alkenone residuals are autocorrelated in time, a point we will discuss in more detail later.
We analyse the residual plots (standardized residuals of the fit relative to the fitted values) of all sediment records for the alkenones and Mg/Ca, as well as the corresponding AOGCM time series (not shown). Across the cores, no clear common pattern in the deviations from linearity is visible which would ask for a non-parametric analysis. We further tested if other  parametric models as polynomial models are more appropriate than our linear model: whereas a polynomial model, as expected, results in a higher explained variance, no relation between the deviations from linearity was found in the GCM and the proxies. In the case of fitting a second order polynomial, the non-linear terms between model and data are uncorrelated.
We therefore continue to favour the linear model. While we acknowledge that this is not a perfect description of the climate response, the linear models provide a good metric to summarize the main behaviour.
In order to further assess the randomness of the proxy linear regressions' residuals, we conduct a formal test on each proxy record. The runs test is applied on the residuals of each linear regression and gives a p value for each regression. We consider 0.05 as our level of significance meaning that a p value < 0.05 rejects the null hypothesis that the data shows no relationship and therefore the residuals are not random but dependent, while a p value > 0.05 signifies randomness (the data is independent).
The test applied on the alkenone linear regressions' residuals shows that there are 40 records out of 52 for which the residuals show randomness, the other 12 being dependent when choosing the p value of 0.05. For Mg/Ca there are 4 dependent data records out of 19. For reference, we mark them in all figures containing the proxy data by crosses or by boxing the name of the records. We calculated furthermore the correlations without the records that did not pass the test: Table S1 in the Supplement gives similar values as Table 3. No spatial pattern of the dependent data records is observed.
As we mentioned before, these analyses also show that the residuals are not independent in time for the alkenone records. This is expected as elements of the climate system, especially the oceans, provide some memory. Further, the recording process, as mixing of the sediment by bioturbation, might further increase the autocorrelation. We therefore account for serial correlation by estimating the effective degrees of freedoms (von Storch and Zwiers, 1999;Mudelsee, 2010). The linear models and correlation coefficients are calculated on the raw data without interpolation. To estimate the bias of the uncertainty estimates caused by serial correlation . We furthermore find that the uncertainty in the simulated SST trends is very small. We include the uncertainty in the trend analysis by adding error bars in Figs. 1-4.

PMIP simulations: comparison with proxy-derived SST trends
To test whether the above-described relation between proxyderived and modelled SSTs are model-dependent, we analyse simulations from the PMIP2 and PMIP3 multi-model experiment (see Sect. 2). Therefore, we compare the difference between the mid-Holocene and PI simulated SST fields to the alkenone-and Mg/Ca-based SST trends ( Fig. 7 for PMIP2; Fig. 8 for PMIP3), as described above for the transient ECHO-G simulations.
In general, the Holocene trends simulated by the models participating in PMIP2, PMIP3, and the ECHO-G transient runs are comparable. We perform this comparison on a global scale for modelled vs. alkenone-derived SSTs (Figs. 7a and 8a) and modelled vs. Mg/Ca-based SSTs (Figs. 7b and 8b) separately. Only a few data markers are close to the unity slope line. The agreement between the models and the SST reconstructions is similar to the case of the ECHO-G model (Fig. 6). Because of space limitations, we do not show all individual model anomalies and their (dis)agreement with the alkenone-derived SST trends. Instead, the median ( Fig. 9a and b) is used to display the common signal. For example, for our list of PMIP2 models (Fig. 9a), it is defined as the value of the 12th ensemble member out of 24 members that are ordered according to ranked values. This reduces features that vary amongst the members and are therefore likely to be regarded as model specific and less reliable. Indeed, the model-data agreement is largest for the ensemble median ( Fig. 9a and b) as compared to each individual member. However, all of the considered models underestimate the temperature trends when compared to the SST trends as recorded by the alkenones by more than a factor of two (Figs. 7a and 8a). Mg/Ca shows again no relationship to the simulated SST anomalies (Figs. 7b and 8b). Since the results of the PMIP runs and the ECHO-G simulation are similar, we continue on to the habitat depth and seasonality in ECHO-G.

Seasonality in ECHO-G
A comparison of SST trends of each proxy with local summer, local winter and annual mean SST trends, as simulated by ECHO-G, indicates which season shows the best agreement between model and proxy reconstruction (Fig. 5). The local summer and winter is defined by the warmest and coldest month. In the North Atlantic Ocean, the best agreement is obtained for local summer (Fig. 5). In other areas, there is no clear evidence for a preferred season. Some cores in close proximity with each other show the best agreement for different seasons in the model. This suggests that the best agreement with a specific season might not always be caused by the seasonality in the recording process. Figure 10 compares the temperature trends derived from alkenone and Mg/Ca records to those calculated from the evolution of the warmest (local summer) and coldest (local winter) month of each year from the mid-Holocene to the present. The correlation between the alkenone proxy record and the climate simulation is higher for local summer (R = 0.44, p < 0.05) than for winter (R = 0.14, p > 0.05), but lower than for the annual mean (R = 0.49, p < 0.05). In the North Atlantic Ocean, the agreement between the reconstructed and the simulated SST trends is still stronger for the local summer than for the annual mean, because the simulated cooling trend is much more pronounced for summer than for the annual mean (Fig. 10b). For Mg/Ca, there is a positive but insignificant correlation for the winter mean (R = 0.17, p > 0.05), and negative and significant correlation for the summer mean (R = −0.56, p < 0.05) and the annual mean (R = −0.31, p > 0.05).  To consider the uncertainties in the SST trends in the model-data comparison, we calculate the weighted correlation between model and proxy data trends. The Pearson correlation coefficients were calculated for annual, local winter, and local summer trends. The weights were calculated with 1 sd 2 , where sd is the standard error of the slope in the proxy SSTs. The weighting of the trends with their uncertainty increases the positive correlations between simulated and observed trend patterns in all cases except for local summer, where correlation is nearly unaffected by the weighting (Table 3).

Habitat depth in ECHO-G
The analyses described so far focused on the model-data comparison at the sea surface. Planktonic organisms are however known to be able to move in the water column (e.g. Conte et a., 2006). In order to depict whether deeper model layers would be in better agreement with the temperature reconstruction, the proxy records were compared to the model for different layers of the upper 100 m of the ocean (10, 20, 52, 75, and 100 m). Layers below these depths can be ignored, since alkenone-producing organisms require sunlight for photosynthesis and are therefore strictly restricted to the euphotic zone. For the Mg/Ca ratio, we also consider only the same upper layers, since the species that are represented in the proxy database used in our study (Globigerinoides ruber, Globigerina bulloides, and Neogloboquadrina pachyderma) are considered as being surface-dwelling foraminifera (Ostermann et al., 2001;Schiebel et al., 1997;Wang et al., 1995). Figure 11a shows the depths of best fit between the modelled SST trends and the alkenone-and Mg/Ca-based SST trends. For alkenones, about a third of the records best agree with the upper level of the ocean, with the highest correlations being present in the upper 10 m (∼ 33 %). The other ∼ 67 % of the records best agree with deeper layers; of those 38 % are located between 10 and 75 m. In general, the number of cores that agree with modelled temperature trends decreases with depth (Table 4).  For Mg/Ca ratios, ∼ 32 % of the records fit best to the modelled temperature trends at 10 m depth. The remainder of 68 % fit best to deeper layers, of which 32 % show best agreement with layers between 10 to 75 m. Figure 11b shows a common pattern of the preferred depth for alkenones, which might be linked to the depth where annual average nitrate concentrations reach high levels according to a modern nutrient climatology (Conkright and Boyer, 2002). This suggests that the nutrient supply from deeper waters is an important influence on the alkenone production as hypothesized by Ohkouchi et al. (1999). Although Fig. 11b and c do not account for zonal oceanic heterogeneities, Fig. 11b and c capture at first-order the ecological preference for algae ecological niches found close to the surface at low and high latitudes (due to the influences of upwelling and of penetration of sunlight into the subsurface layers, respectively), while the midlatitude regions where the nutricline is found deeper show an increase in the depth where model and data agree best (Fig. 11b). We note that not all red and blue points are pairwise in the panel because of missing data and the restriction to the upper 300 m in the panels.

Changes in the recording season and habitat depth in ECHO-G
While choosing specific depths and seasons in the model simulations decreases the mismatch between reconstructed and simulated trends, the amplitude of the simulated trends is still smaller than of the reconstructed trends (Figs. 5 and 6). We therefore evaluate two potential parameters that might be able to partly explain the misfits found in the modeldata comparison: a time shift in the recording season and a Figure 9 The model-alkenone data disagreement would vanish for up to 37 % of the records by considering a potential vertical shift of the habitat depth of the proxy-producing organism in the water column by less than 20 m. For up to 52 % of the alkenone records, a time shift in the blooming season of less than 14 days could explain the model-data mismatch (for the seasonality of the modelled SSTs, see Figs. S1-S4 in the Supplement). In total, up to 62 % of the records can be explained by at least one of the two shifted parameters, whereas the remaining 38 % of the cores cannot be explained by any of these potential parameters. We further note that for 38 % of the records, the ambient temperature exceeds the calibration range 6-25 • C for which alkenones are most sensitive to SST (e.g. Conte et al., 2006). In tropical warm pools and polar regions, the ambient water temperature induces only small changes in the U K 37 index, reducing the sensitivity of alkenone palaeothermometry for these regions Mix et al., 2000;Conte et al., 2006); therefore the recorded temperature proxy at those locations might be problematic. However, as the sensitivity of U K 37 on temperature changes seems to be reduced in these tails of the calibration, we do not expect that this mechanism leads to an overestimation of the trends. The same analysis performed for the Mg/Ca records (Fig. 13) shows that up to ∼ 26 % of the records could potentially be reconciled with the model simulation if we consider a shift in habitat depth of up to 20 m. For up to 21 % of the records, a shift in recording season of less than 14 days could explain the disagreement between model simulation and data reconstruction. The remainder 58 % of the Mg/Ca records cannot be explained by any of these two parameters. For an overview on the agreement between model and proxy data, we refer to Table 5.

Discussion
Our analyses show that, in general, the model and alkenonebased Holocene SST trends show a similar pattern, but the amplitude of the modelled temperature trends are weaker when compared to the proxy records. The Mg/Careconstructed temperature trends do not show a positive Table 5. Overview on the agreement between model and proxy data. We list cores that agree with the model simulation at some time during the year and summarize the number of cores that could be reconciled with the model simulation by assumed shifts of < 20 m of the habitat depth and < 14 days of the blooming season. We also list cores that show a difference of more than 2 • C to the model simulation, and note cores that might be biased by calibration uncertainties.  relation to the simulated trend pattern. The observed mismatch between the proxy records and the model simulations might be caused by model deficiencies as well as by biased and/or misinterpreted proxy records. In the following, we will discuss several hypotheses.

Recorder system: potential seasonal biases
The deviation between climate simulations and proxy records could be at least partly attributed to the way by which proxies record the temperature signal, and how this information is interpreted. Systematic changes in the living season over the course of the Holocene might cause a biased temperature reconstruction. Since, in our study, the alkenone-and Mg/Ca-based SST reconstructions cannot be reconciled with an annual mean temperature signal as simulated by the climate models, we further consider potential seasonal biases of the proxies. When considering local summer and local winter ( Fig. 10a and b), temperature trends derived from alkenone and Mg/Ca records show significant correlations between the alkenone proxy record and the climate simulation for the summer (R = 0.44, p < 0.05) and the annual mean (R = 0.49, p < 0.05). In the case of Mg/Ca, we observe negative significant correlations for the summer (R = −0.56, p < 0.05) and the annual mean (R = −0.31, p > 0.05). Considering local seasons does not decrease the disagreement between model temperature trends and alkenone SST trends. For Mg/Ca we find an improvement for winter mean, but the correlation is weak and not significant (R = 0.17, p > 0.05). This is likely caused by a regional dependency of the seasonal bias. However, even allowing a different seasonality for each core leaves a mismatch to the simulated trends for more than 50 % of the alkenone cores and about 50 % for Mg/Ca cores (Fig. 6). The degree of seasonal bias might be spatially dependent since the biogeographical properties of the ocean differ from one location to another (Prahl et al., 2010, and references therein). Lorenz et al. (2006) summarized previous studies, which suggested that in high latitudes the maximum production of coccolithophorids occurs in summer (Baumann et al., 1997(Baumann et al., , 2000, which supports the idea that alkenones record summer temperatures (Sikes et al., 1997;Leduc et al., 2010a;Prahl et al., 2010). Satellite data further supports the idea of summer-biased alkenone records (Iglesias-Rodriguez et al., 2002).

Recorder system: regional seasonal biases
It has been argued that high-latitude alkenone production may be light limited (Leduc et al., 2010a;Schneider et al., 2010) and therefore records the summer season, which may explain why alkenone-derived SST trends in the North Atlantic Ocean follow the Northern Hemisphere summer insolation (Lorenz et al., 2006). Indeed, we find a good agreement  between modelled summer temperatures and the proxy reconstruction for the North Atlantic Ocean (Fig. 10b), but we still observe a disagreement between the amplitudes of the trends. There is also a clear mismatch between modelled and reconstructed SST trends in the eastern and western Pacific Ocean (Fig. 10b). Alkenone SSTs in the southern high latitudes were proposed to be skewed toward summer as well (Sikes et al., 2002). Alkenone SSTs located in the southern mid-to high latitudes indicate a Holocene SST cooling which is reproduced by the modelled summer SST evolution, even though the magnitudes of SST changes are still larger in the proxy records (Fig. 10b).
Seasonality in phytoplankton production is generally less pronounced in tropical and subtropical regions (Jickells et al., 1996), and alkenone-derived SSTs from low-latitude sites are therefore more likely to be representative of temperatures close to the annual mean values (Müller and Fischer, 2001;Kienast et al., 2012). It also has been argued that at low latitudes alkenones might record a boreal winter signal when a decrease in the surface ocean stratification reduces SST and enhances primary productivity (Bijma et al., 2001;Leduc et al., 2010a). In our study, we find best agreement between reconstructed and annual mean temperatures in low latitudes (Fig. 5). We still have a considerable number of records (∼ 52 %) at low latitudes that agree best with either  local summer (∼ 35 %) or local winter (17 %), but the spatial patterns of these matches are featureless. In the eastern South Atlantic Ocean, only one record fits best to the mean annual SST while the nearby cores show a local summer signal. For Mg/Ca in tropical regions, the large heterogeneity in Mg/Ca and modelled SST trends does not allow us to draw any firm conclusion on where and how model and data disagree (Figs. 5-8). Schneider et al. (2010) employed marine temperature proxies for a model-data SST comparison, using results from an AOGCM for three time slices (9.5, 6 kyr BP and PI). In their study, they made several assumptions on how proxy records might be seasonally biased by defining four different filters. They estimate which seasonal bias might be represented in a certain proxy record, and identify regions where proxy records are biased towards a specific season, by applying these filters to the simulated SST trend. Schneider et al. (2010) defined a seasonal index weighting that relies on the modern relationship between net primary production (NPP) and SSTs. We do not assume a constant PI relationship between NPP and SST and refrain from considering such a filter. Similar to our study, Schneider et al. (2010) found that the North Atlantic Ocean is mostly influenced by a local summer bias in alkenone-based SSTs, while records in the western Pacific Ocean preferably represent a winter signal. The correlation values between proxy records and filtered model simulations that Schneider et al. (2010) estimate are higher than those that we find, which might be attributed to the preselection of proxy records that Schneider et al. (2010) applied prior to the correlation estimation.

Recorder system: habitat depth
In the same way, a change in the habitat depth of the SST carries over the Holocene could create deviations between proxy records and model simulations. Such changes in habitat depth and recording season could have been caused by changes in insolation over the Holocene or by related changes in the ocean temperature and nutrient distribution that the alkenone-producing organisms and the foraminifera are exposed to.
Comparing the reconstructed Holocene temperature trends at model levels in the upper 100 m does not remove the discrepancy between models and proxies. For Mg/Ca ratios, the greatest number of records fit best with model trends at 10 m water depth (∼ 32 % of the records), and smaller proportions of Mg/Ca ratios fit best with trends for depths greater than 10 m (Table 4). For alkenones, we find best agreement in the upper 10 m (∼ 34 %), while the other ∼ 66 % of the records best agree in deeper layers (Table 4). Thus, we find that the highest agreement between proxy-recorded and simulated temperature trends is in the most upper ocean layer. This is in agreement with results from Rosell-Melé et al. (1995), who compared core-top alkenone-derived SSTs from the surface sediments to SSTs from overlying waters for different depths and found best agreement of alkenone SSTs with temperature at the ocean surface.
Our calculations of seasonality and upper ocean stratification are based on model output. They do not provide any diagnostic on the real ecological behaviour of planktonic organisms. However, they do provide a mapping of oceanic regions where even small changes in the planktonic organisms' ecology can have large consequences on the reconstructed local SST trends. It reinforces the idea that alkenones and Mg/Ca may be affected by ecological specificities (Leduc et al., 2010a).

Recorder system: shifts in seasonal preferences and habitat depth
Since neither seasonal nor habitat depth preferences of the proxy recorder can resolve the model data mismatch, we explore whether shifts in either depth habitat or growing season from the mid-Holocene to the present might cause the model-data disagreement. To explain the model-data mismatch by those mechanisms, summer sensitive proxyrecording species in the northern high latitudes would have to record summer temperatures in the mid-Holocene, and temperatures that are biased toward spring or autumn in the present-day climate. If the organisms changed their recording behaviour over the Holocene in such a way, this would increase a corrected proxy-based SST trend. Consequently, a corrected proxy-based SST trend would be in better agreement with the model simulations of the Holocene. On the one hand it is questionable whether proxyrecording species really behave in such a way, as the organisms would likely try to keep their preferred ecological conditions by shifting their living seasons in a way that mitigates the changes in the climate (Mix, 1987). Fraile (2008) and Fraile et al. (2009) analysed the seasonality of the foraminifera species using a planktonic foraminifera model and showed that the organisms record a weaker temperature signal if a change in global temperature is applied. They performed a model sensitivity study by decreasing the global temperature by 2 and 6 • C, and found a shift in the maximum planktonic foraminifera abundance towards warmer seasons, which would decrease the temperature trend captured in Mg/Ca records (Fraile et al., 2009).
On the other hand, planktonic organisms are subject to several limiting factors, e.g. temperature as well as lightand nutrient-availability. If those factors change in opposite directions, the organisms might change their living season without bypassing their basic ecological requirements. For example, food or nutrient availability might shift towards spring or autumn so that the living season might shift accordingly. To be able to explain such shifts, more studies using complex ecosystem models of the planktonic organisms need to be done, such as ecophysiological models reproducing the growth of planktonic foraminifera .
It is not obvious which amplitude a seasonal shift realistically might have had during the Holocene. Our results show that indeed 48 % of the alkenone records cannot be reconciled with the model simulation when considering a shift of less than 14 days over the last 6 kyr. For nearly all the records (47 out of 52), the mismatch between model and data can be removed by allowing a longer time shift of the recording season of up to 60 days (Fig. 12b). In the case of Mg/Ca, up to ∼ 21 % of the records could be reconciled with the model simulation if we consider a potential shift in the recording season of less than 14 days ( Fig. 13b and c), but only 1 out of 19 records would require a shift of more than 60 days.
We also find a latitudinal-dependent depth profile of alkenones which might be linked to the nitrate concentration (Fig. 11b). This suggests that the best fitting model depth indeed might depend on the location, which might reflect an influence of stratification on surface ocean biogeochemistry and stratification. For Mg/Ca we do not detect a relationship (Fig. 11c), although we did not take into account speciesspecific ecological behaviour in our study. While the lowlatitude Mg/Ca records derived from the symbiont-bearing foraminifer G. ruber require those records to be restricted to the euphotic zone, the mid-to high-latitude records derived from G. bulloides and N. pachyderma may integrate to some extent a subsurface signal.
If such a preference for a certain habitat depth changed with time, this would allow for another mechanism that might explain the mismatch between model simulation and proxy-reconstruction linked to planktonic organisms' migrations in the water column. Such a shift in living depth is supported by the indication that the detachment of coccoliths on www.clim-past.net/9/1807/2013/ coccolithophores plays a role in the regulation of buoyancy (Fritz and Balch, 1996). The non-detachment of the coccoliths would allow the alkenone-producing organisms to migrate by as much as 100 m in the euphotic zone in about 75 days (Fritz and Balch, 1996), facilitating access to the subsurface nutricline (Munk and Riley, 1952).
The vertical shift that might eliminate the disagreement between proxy recorder and model simulation can reconcile up to 37 % of the records with the modelled SST trends if a vertical shift of less than 20 m is allowed. About 46 % of the cores would require a shift of more than 50 m to be in agreement with modelled SSTs. Considering the annual cycle of the maximum alkenone concentration reported by Ternois et al. (1997), the possibility of a vertical shift of about 20 m seems to be a reasonable assumption. Therefore, a significant proportion of the proxy records may be reconciled with the model simulation by assuming a vertical shift of the alkenone-producing organisms' habitat depth. Yet, in our study shifts in seasonality seem to have greater potential to explain the model-proxy disagreement. It is possible that biases in the palaeothermometers may add further degrees of freedom to reconcile models and data. We however do not believe that those biases would be systematic enough to be responsible for the observation we made that models seem to underestimate Holocene SST trends as alkenone and Mg/Ca records suggest.

Climate models: coarse resolution
Climate models are limited in their spatial resolution (computational constrains) and necessary approximations. Therefore, they cannot represent the full complexity of the earth system. The proxy records used in this study (and many others) are mostly located in coastal areas. These regions are typically not well represented by climate models due to their low resolution. Coastal areas may be especially sensitive to external forcing, since their thermal inertia may be lower than that of the open ocean due to a shallower thermocline and land-ocean interactions. Furthermore, the representation of mixed layer dynamics is probably important to improve climate simulations and their agreement with palaeoceanographic reconstructions.
In order to see the potential source of uncertainty in the models, we evaluate the temperature and mixed layer depth in the models by comparing them with the temperature and mixed layer depth of the ocean reanalysis data SODA covering 1958-2001(Carton and Giese, 2008Carton et al., 2005). The mixed layer depth of the model (HOPE-G) is calculated following the temperature criterion as described in Levitus (1982), which defines the mixed layer as the depth at which the temperature change from the surface temperature is 0.5 • C. The same method is used for calculating the mixed layer depths from the ocean reanalysis data. Before the calculation of the mixed layer depth, the observational data were vertically interpolated to the HOPE-G model depths.  Annual mean mixed layer depth in the ocean (m) following the temperature criterion as described in Levitus (1982). (a) HOPE-G, and (b) for the SODA dataset (Carton and Giese, 2008;Carton et al., 2005).
For the ECHO-G model, we used the mean of the last 50 yr representing the latest Holocene. We see a similar large-scale pattern in the model and SODA (Fig. 14), but also deviations especially in the Southern Ocean where bounday and ice-shelf processes are not resolved. Finally, we calculate the SST bias between the ECHO-G, the PMIP2, and PMIP3 models for their pre-industrial climate and SODA (similar results are obtained when using other SST data, not shown). Figure 15 indicates a strong bias at high northern latitudes towards colder conditions, and a warm bias in the Southern Ocean and in major upwelling regions for the models (ECHO-G, PMIP2, and PMIP3 in the panels, respectively). We detect furthermore a strong systematic bias in the Gulf Stream fronts (Fig. 15). The temperature changes are clearly outside the range of decadal-to-centennial variability (Wei and Lohmann, 2012). It is beyond the scope of the present Clim. Past, 9, 1807-1839  paper to discuss the reasons for the systematic model deficiencies, but one can mention the uncertainty related to subgridscale oceanic mixing and the difficulty in resolving the frontal systems.
Other local feedbacks operating in upwelling systems might also complicate the SST model-data comparison, since local cooling can take place within regions where in general widespread warming is observed (Leduc et al., 2010b). In a similar way, mismatches can be due to difficulties in capturing changes in oceanic fronts in the models.
The similarity of the results when using the transient ECHO-G simulation and the ensemble of PMIP simulations shows that the deviation between proxy data and model simulations does not seem to be a problem of specific climate models, but seems to be a robust feature of Holocene climate simulations with global coupled climate models. One testable hypothesis is that proxy records can therefore correctly record local temperature trends that cannot be simulated by the models. A possible way to examine this effect can be through a new ocean model which has high resolution of up to 7 km in deep water formation areas and in coastal areas where a higher sensitivity to external forcing is expected (Scholz et al., 2013). A logical next step is the application of this model to the Holocene.

Spatial representativeness of the data
Palaeoclimate information gathered from model-data comparisons are difficult to be put into a context which goes beyond a description of observed model-data discrepancies, as both climate models and proxy reconstructions are imperfect and have very different characteristics. Proxy reconstructions are sparse and patchy, and can be affected by local processes and/or proxy specificities, which are not always considered in palaeoclimate reconstructions. Usually, palaeoclimatologists tend to obtain data in regions where sedimentation allows it and where the signal is clear. Therefore, it could be that the SST signals are overestimated due to the selection of the sites. Climate models have coarse spatial and temporal resolutions, but can resolve changes in climatic features with a global perspective and thus help in identifying the mechanisms of climate variations. Here, we discuss the large-scale pattern of the temperature evolution only. Spatially heterogenous patterns and regional dynamics provide an additional uncertainty for our data-model comparison. Furthermore, we cannot exclude that part of the signal is due to differential degradation of alkenones under contrasting bottom water oxygen conditions (Hoefs et al., 1998;Gong and Hollander, 1999). We will follow this hypothesis in a further study to examine if and how redox conditions during early organic matter diagenesis can also be determined. Additional biases may also complicate the interpretation of Mg/Ca and alkenone palaeothermometers. In particular, alkenones can be transported over long distances along with fine-grained particles (Ohkouchi et al., 2002), while Mg/Ca may be impacted by dissolution . Alkenone advection over long distances can invalidate those records as local SST indicators (Sicre et al., 2005). However, we expect that advection would in general tend to reduce the signal when propagating through different water masses smoothing the signal. It is therefore not likely that advection plays the dominant role for the large-scale reconstructed temperature signal based on alkenones.
As we cannot monitor the two above-mentioned processes for our database, we consider them as having not affected the Holocene SST records we analyse here. We however do expect those biases to have had an impact on the reconstructed SST trend in specific regions only and not as a whole. We note that in frontal systems and dynamically active regions, it can strongly affect the interpretation (e.g. Rühlemann and Butzin, 2006).

Calibration of the proxy data
In regional and global core-top calibrations, U K 37 correlates best to the annual mean SST (Rosell-Melé et al., 1995;Herbert et al., 1998;Müller et al., 1998), but this finding could be limited to the spatial relationship and does not imply that U K 37 individually record annual mean surface temperatures. Indeed, high-latitude core-top studies suggest that the alkenones are skewed toward summer temperatures (Sikes et al., 1997;Prahl et al., 2010). At low latitudes, it is also unclear whether alkenone-based SST estimates reflect mean annual SST or are skewed toward seasons during which temperatures are below the mean annual SST. A study of regional sediment traps has shown that low-latitude alkenones most likely record annual mean temperatures (Müller and Fischer, 2001) despite the fact that alkenone-producing coccolithophorids mostly thrive during winter to spring (Müller and Fischer, 2001;Cortés et al., 2001;Bijma et al., 2001), or more generally when nutrients are abundant (Baumann et al., 2000). A recent core-top study from the eastern equatorial Pacific Ocean seems to confirm this hypothesis (Kienast et al., 2012). Such observations may explain the mismatch between the SST calibration curves which are established from alkenone SST derived from sediment trap material (that best fit with ambient temperature through a non-linear calibration curve) and those which are established from core tops (that best fit with the mean annual SST overlying core tops through a linear calibration curve) (Conte et al., , 2006. Conte et al. (2006) argued that such mismatch can at least partly be explained by seasonality and water depth of coccolithophorids, suggesting that ecological effects may somewhat be embedded in modern sedimentary material.
The uncertainties embedded in seasonal signals of Mg/Cabased SST data may be more readily identified since available SST records and calibrations are species specific (e.g. Anand et al., 2003). It means that refining Mg/Ca interpretation in light of the foraminiferal seasonal preferences may theoretically be undertaken by field studies. Yet, seasonal preferences for a given species can also vary from site to site. For example, fluxes of the surface-dwelling planktonic foraminifer G. ruber, the species most represented in the updated version of the GHOST database, were found to be maximum during summer in the Panama Basin when surface waters are well stratified (Thunell et al., 1983), but during winter south of Java when upwelling occurs . G. bulloides is, on the other hand, usually associated with upwelling events and tends to flourish whenever primary productivity increases since it needs abundant food to develop (e.g. Lombard et al., 2011).

Model-data comparison: ecological requirements
Model-data comparisons of Holocene temperature evolution induced by insolation changes have independently proposed that seasonality of coccolithophorid blooms can explain part of the reconstructed temperature signal at low latitudes (Lorenz et al., 2006;Schneider et al., 2010). However, beyond the firm limits of basic ecological requirements of planktonic organisms, there is still a lack of a conceptual model for explaining the season and water depth embedded in SST signal carriers that can globally explain how and where ecological optima are reached for a given foraminifera or coccolithophorid species. Our study goes beyond the work of Schneider et al. (2010) in that we quantify the amplitude of the biases that the proxy records might include. Additional to seasonal shifts of the recording season, we also include and quantify shifts in the habitat depth.
The dependence of the temperature record on the habitat depth has not been studied as much as seasonality. For the Mediterranean Sea, Bentaleb et al. (1999) suggested that alkenones are essentially synthesized at levels of highest primary production, and therefore may record a signal which integrates subsurface temperature where a chlorophyll maximum can develop seasonally. Another study from the Arabian Sea indeed demonstrated that alkenone-synthesizing coccolithophorids are several orders of magnitude more abundant at subsurface as compared to the surface (Andruleit et al., 2003). This vertical displacement is of course strictly restricted to the euphotic zone, which sets a lower firm limit on where alkenones are being synthesized. As for coccolithophorids, foraminifera are subject to changes in the depth habitat (see e.g. Fairbanks et al., 1982). Even though G. ruber and G. bulloides are both considered as surface ocean dwellers, only G. ruber must thrive in the euphotic zone to allow the photosynthesis of its symbiont-bearing organisms. Recent studies indeed suggest that the G. bulloides life cycle associated with gametogenesis involves calcification of its test within subsurface and may significantly affect its resulting Mg/Ca value (Marr et al., 2011). In summary, it appears that potential changes in seasonality and upper water column structure, likely accompanying Holocene changes in ocean dynamics, provide two degrees of freedom which have the potential to explain the model-data mismatch and warrant further investigations.

Forcing and internal variability
Besides the insolation forcing, changes in greenhouse gases may play a role. However, the radiation effect due to CO 2 is rather small and has a negligible influence on our results (not shown). Internal variability is expected to have a minor effect on the overall hemispheric temperature trends. However, it was concluded that part of the regional Holocene SST trend can be attributed to a pattern which resembles the Arctic Oscillation/North Atlantic Oscillation (Rimbu et al., 2003) and modulations of the Icelandic Low (Lohmann et al., 2005) showing opposite SST trends at one latitude. Such features are more difficult to assess in data and models because of their spatial heterogeneity and atmospheric dynamics.
One particular example of how complex the temperature trends in the North Pacific and Atlantic oceans are can be seen in Fig. 5. The opposite long-term SST trends between the northeastern Pacific and the northeastern Atlantic oceans during the Holocene has been attributed to inter-oceanic teleconnections during the Holocene related to Pacific-North Atlantic mode of variability . Such features have not been sufficiently tested in climate models on long timescales.
In the reconstructed SSTs, the millennial variability seems to be a robust feature through the Holocene . Wirtz et al. (2010) report furthermore on change in climate variability in the early to mid-Holocene which might influence regional temperature. However, Moros et al. (2004) report that, in the northern North Atlantic Ocean, planktonic δ 18 O shows a "rather flat early-to mid-Holocene and a marked increase in amplitude and decrease in mean values from about 4 kyr BP", which could be linked with pronounced variations in the recorder system associated to different seasons and water depths. Large millennial variability in the data can mask the temperature trend, and in principle we cannot exclude other factors affecting the climate (Bond et al., 2001;Sundqvist et al., 2010). However, since we have taken the temperature trends only, we eliminated to a large extent the effect of millennial climate variability in our analysis (Figs. 1-4). Our analysis also shows that the estimates of the errors in the trends are larger when less data is available (Figs. 1-4). Interestingly, the long-term variability as documented in the data Wirtz et al., 2010) is highly underestimated in multi-centennial to multimillennial model experiments. This variability, however, is beyond the scope of the present paper.

Climate models: sensitivities to long-term changes
Climate sensitivity is defined in the sense of Charney (1979), in which fast feedback processes are allowed to operate, but long-lived atmospheric gases, ice sheet area, land area and vegetation cover are considered as fixed forcings. Fast feedbacks include changes of water vapour, clouds, climatedriven aerosols, sea ice, and snow cover. Our inference that models do not capture the Holocene trends with respect to the amplitude found in the records could raise doubt about the correct representation of climate sensitivity in the climate models on long timescales. Laepple and Lohmann (2009) calculated empirically the Holocene temperature evolution based on the analogy with the temperature response to the seasonal cycle. It turned out that the climate patterns resemble the large-scale features of the modelled Holocene trend , but the amplitude and regional changes associated with circulation changes are not well captured. By construction, longterm feedbacks are missing in such an approach. It is conceivable that present climate models neglect in a similar way long-term feedbacks amplifying the orbital forcing. The G. Lohmann et al.: A model-data comparison of the Holocene global sea surface temperature evolution obliquity forcing provides a pattern of high-latitude cooling and low-latitude warming of annual mean temperature, while the precession response is only due to non-linearities (e.g. Laepple and Lohmann, 2009). Future sensitivity studies should identify potential missing positive feedbacks in the system. Indeed, experiments indicate potential positive feedback amplifying external forcing which is related to details in the representation of vegetation and albedo in the models (O'ishi and Abe-Ouchi, 2011). Warming in their models is due to direct amplification of warming over high-latitude land through increases in vegetation and reduced albedo during the summer and indirect amplification through sea-ice feedback in autumn and winter and snow albedo feedback in spring. Further model studies are necessary to examine whether the long-term climate sensitivity to orbital forcing has been underestimated.
We have to identify the model-data discrepancy in order to have a reliable estimate of simulated temperature trends of the past, their error bars, as well as an estimate of climate sensitivity on long timescales. It could be that current climate simulations underestimate the full range of climate warming on centennial to millennial timescales that might arise as a result of anthropogenic greenhouse gas emissions. The concept of climate sensitivity relies on the responses of slow feedback processes to forcing and subsequent involved feedback mechanisms (Hansen et al., 2007). It is therefore likely that the climate sensitivity (to greenhouse gas forcing and orbital forcing) is much greater than that due to fast feedbacks.

Conclusions
Our study shows that model simulations of the Holocene temperature evolution predict a large-scale high-latitude cooling and low-latitude warming, which is consistent with the temperature trends expected from insolation changes due to orbital forcing. The reconstructed SST trends by alkenones show a similar sign to the models, but about 75 % of the trend variance in the sediment records remains unexplained.
The amplitudes of the simulated trends are significantly smaller than the reconstructed temperature trends by alkenones. This deviation persists for all considered models, even if we take into account seasonality and different water depths at which the recording organisms may have lived. This raises important questions as to whether climate models have fundamental deficiencies, and (or) whether our understanding of the proxy records still needs to be refined. We find best agreement between reconstruction and annual mean temperatures in low latitudes. The large spatial heterogeneity in Mg/Ca and modelled SST trends does not allow us to draw any firm conclusion on where and how model and data disagree. Non-temperature effects upon the incorporation of Mg/Ca in the foraminiferal shell (Arbuszewski et al., 2010), resuspension and redeposition of U K 37 markers (Ohkouchi et al., 2002), and other possible post-depositional effects on Mg/Ca (Regenberg et al., 2006) or on U K 37 (Hoefs et al., 1998;Gong and Hollander, 1999) could cause discrepancies.
We evaluate several mechanisms that can be responsible for the observed mismatch between the reconstructed and the modelled magnitude of the Holocene SST trends. These are systematic changes in the living season over the course of the Holocene and a varied habitat depth of the SST signal carriers. In many cases (up to 62 % for alkenones and 42 % for Mg/Ca in our study), the mismatch between proxy and simulation may be removed if these mechanisms are considered. The amount of vertical shift of the recorder depth, or of the shift of the living season, that is needed to remove the model-proxy mismatch is within ranges that we consider realistic for climatic changes over the last 6 kyr.
We consider the model-data mismatch of Holocene temperature trends as being indicative of either model deficiencies or data particularities with respect to the planktonic organisms' ecology. When interpreting the proxy records, two assumptions regarding the stationary seasonality and habitat depth can be made. First, one could assume that the seasonality and the habitat depth of planktonic organisms did not change under varying climate conditions during the Holocene. Such an assumption is generally applied while interpreting palaeoreconstructions because it is difficult to assess how hydrographical changes occurred over contrasting seasons and habitat depths in the past. In such case, a good knowledge of the modern seasonality and living depth of coccolithophorids and foraminifera would be sufficient for the interpretation of the temperature record. Second, one could assume that living season and the habitat depth may have changed over time. In such a likely situation, the interpretation of the proxy record becomes more difficult. However, considering ecological limits of seasonality and habitat depth, the model simulations can be used to extract the range of possible proxy trends consistent with the simulated climate. Ecophysiological models accounting for planktonic foraminifera ecology capture most of the first-order seasonal and depth habitat preferences of the most commonly used species for Mg/Ca-based reconstructions (Fraile et al., 2009;Lombard et al., 2011). These models have further pointed out that any past climate change affecting surface ocean characteristics may alter foraminifera-derived SST climatic signals by modulating environmental characteristics for which planktonic foraminifera have optimal living conditions (Fraile et al., 2009;Bassinot et al., 2011). It is also conceivable that oceanic vertical mixing caused by atmospheric circulation and synoptic storms can affect the coccolith bloom period (Moros et al., 2004).
The underestimation of the Holocene SST trends by the models and/or the data overestimation is indeed a global and persistent feature that might be weakened, but not completely removed, if we consider proxies through their ecological prism. We show that differences in the magnitude of Holocene SST trends between model simulations on the one hand, and a global dataset of alkenone-and Mg/Ca-derived palaeotemperatures on the other hand, can be reconciled to some degree by considering shifts in seasonality and habitat depth -two parameters known to be relevant for understanding alkenone and Mg/Ca palaeothermometry in the modern ocean.
This suggests that the discrepancy between our proxy database and the considered climate models is not only caused by a specific problem of the marine records used in this study, but a general problem that also occurs in other model-data comparisons (Brewer et al., 2007;Zhang et al., 2010;O'ishi and Abe-Ouchi, 2011;Braconnot et al., 2012). At northern high latitudes, Sundqvist et al. (2010) report an annual mean 2 • C cooling from the mid-Holocene to pre-industrial values, again larger than the model trends. A similar systematic deviation is also found for δ 18 O-derived temperature trends from Greenland and Antarctic ice cores (Masson-Delmotte et al., 2006). They found that most models capture the correct sign of the reconstructed temperature change on Greenland, but underestimate its amplitude by a significant factor (e.g. Vinther et al., 2009). Recently, Braconnot et al. (2012) emphasized that the large-scale pattern in the Last Glacial Maximum and mid-Holocene simulation captures large parts of the temperature and precipitation changes over land, but it tends to underestimate the magnitude of regional changes.
It is therefore conceivable that the observed mismatch between modelled and reconstructed Holocene climate evolution is related to the lack of representativeness of long-term temperature trends in climate models. The models may not be sensitive enough with respect to insolation, may not be able to fully capture the natural range of climate variability, or might have regional biases linked to the fact that the proxy data records used in this study are located in coastal areas which are challenging to simulate with global climate models. Further studies are required to examine possible feedback mechanisms affecting the long-term climate sensitivity.