A critical look at solar-climate relationships from long temperature series

A key issue of climate change is to identify the forcings and their relative contributions. The solar-climate relationship is currently the matter of a fierce debate. We address here the need for high quality observations and an adequate statistical approach. A recent work by Le Mouel et al. (2010) and its companion paper by Kossobokov et al. (2010) show spectacular correlations between solar activity and temperature series from three European weather stations over the last two centuries. We question both the data and the method used in these works. We stress (1) that correlation with solar forcing alone is meaningless unless other forcings are properly accounted for and that sunspot counting is a poor indicator of solar irradiance, (2) that long temperature series require homogenization to remove historical artefacts that affect long term variability, (3) that incorrect application of statistical tests leads to interpret as significant a signal which arises from pure random fluctuations. As a consequence, we reject the results and the conclusions of Le Mouel et al. (2010) and Kossobokov et al. (2010). We believe that our contribution bears some general interest in removing confusion from the scientific debate.© Author(s) 2010.


Introduction
Exploring relations between the solar decadal variations and climate has been a matter of interest over several decades (e.g. Siscoe, 1978;North and Stevens, 1998;Stott, 2003;Gray et al., 2005;Cahalan et al., 2010;Gray et al., 2010). One of the main motivations of such studies is to assess the role of solar variations in the observed climate varia-Correspondence to: B. Legras (legras@lmd.ens.fr) tions, compared to the internal variability and the changes induced by anthropogenic effects. As shown by satellite data since the late 70s, the variations of total solar irradiance are rather small: 0.1% for the amplitude of the 11-yr cycle and even less for the long-term baseline over the past three cycles (Fröhlich, 2009). These changes induce a climate forcing which is an order of magnitude smaller than the present increase of radiative forcing due to greenhouse gases. In relative terms, the modulation is larger for the UV range of the radiative spectrum (Lean, 2010;Gray et al., 2010). The stratosphere is affected with consequences onto the troposphere (Labitzke, 2001;Gray et al., 2005Gray et al., , 2010. However, the presence of a UV trend over the last decades is still debated (Fröhlich, 2009;Harder et al., 2009).
Another approach is to hypothesize that the impact of solar variations should leave a signature in the data and to analyse empirically the link between solar forcing and climate records (White, 2006;Camp and Tung, 2007;Lean and Rind, 2008) Correlations are used as a practical basis of knowledge in disciplines, like medicine or sociology, where the theory is qualitative and mathematical tools cannot be derived from basic principles. In such fields, the application of rigorous procedures is mandatory to establish that correlations are not spurious. This depends on the ability to formulate and test null hypothesis versus an alternative. Unfortunately such recipe is not always applied and a number of studies interpret as significant correlations which are in fact obtained by chance due to the lack of tests or the application of inappropriate statistical tests as discussed in, e.g., White (2000). Another important source of spurious results is data artefacts that have not been corrected. This case is encountered for long historical series collected using methods, instruments and protocols which varied with time and for which information about such changes is not necessarily available. Long Published by Copernicus Publications on behalf of the European Geosciences Union. solar series where various proxies are used to estimate the solar radiation are also prone to bias. Last, when multiple factors have to be taken into account, as in climate studies, it is necessary to assess the importance of all of these factors together.
A recent series of three articles (Le Mouël et al., 2009Kossobokov et al., 2010) published by the same group of authors has studied long series of ground temperatures collected by the European Climate Assessment and Dataset project (Klein Tank et al., 2002) (hereafter ECA&D) and found a number of correlations with several indexes of the solar activity.
We focus here on the last two papers of this series, which are based on very long temperature records, starting in the late 18th century. Part of our study, particularly regarding homogeneity, applies also to Le Mouël et al. (2009) which has already been commented by Yiou et al. (2010), concluding that displayed solar-climate correlations are not significant.
Le  (hereafter LMKC) and Kossobokov et al. (2010) (hereafter KLMC) use the number of sunspots to separate the years belonging to the interval 1775-2005 into two ensembles of High versus Low solar activity and produce daily composites of the temperature difference between these two ensembles for three European stations, Praha, Bologna and Uccle which have recorded temperatures over the last two centuries. They claim to find highly significant results that demonstrate the prominent role of solar variations in local climate.
Here we demonstrate that the approach of LMKC and KLMC is pervaded by a combined effect of data series artefact and inadequate error analysis which impairs their results. In Sect. 2, we discuss the multiple forcing of climate variability and the appropriateness of sunspot counting as solar proxy. In Sects. 3 and 4 we discuss issues related to inhomogeneities in temperature series, particularly in the public dataset used by LMKC and KLMC. In Sects. 5 and 6, we show that inadequate account of the number of degrees of freedom leads to a large underestimate of the confidence interval found in LMKC, and that the remaining significance is only due to coincidence of high solar activity with anthropogenic forcing over the last 50 years. The calculations related to these two sections are also provided as a Mathematica notebook along with source data in the Supplement. Section 7 discusses the statistical tests provided by KLMC. Section 8 offers a summary and further discussion. Although this study is focused on the discussion of a single work, we believe that it is useful to enlighten a fairly large field of current research on solar-climate relations.

Solar variations and other climate forcings
Both papers (LMKC and KLMC) are based on the comparison of three time series of surface temperature with a record of sunspot numbers, taken as a proxy of solar variations. We point out that a proper attribution study should also take into account other sources of natural variability, such as major volcanic eruptions or internal oscillations of the climate system (e.g. the El Niño-Southern Oscillation; ENSO), and of the anthropogenic forcing over the last century (greenhouse gases (GHG), aerosols). It has been demonstrated that a correlation analysis which takes only one cause into account can lead to a spurious attribution (e.g., Scafetta and West, 2006acriticized by Benestad and Schmidt, 2009).
As an obvious example, it is useful to consider the global temperature record since 1950. Interannual changes of the order of 0.1-0.2 • C are superimposed on the long-term trend of 0.2 • C/decade usually attributed to anthropogenic forcing (Meehl et al., 2004;Stott et al., 2006;Huntingford et al., 2006). These fluctuations in the global temperature record are partly linked to the 11-yr sunspot cycle, whose irradiance influence has been evaluated to ≈0.1 • C (Hansen et al., 2005;Hegerl et al., 2007b). However, a statistical attribution should also take into account the last three major eruptions Agung (1963), El Chichon (1982 and Pinatubo (1991): their influences lasted a couple of years after the events, falling roughly during the descending phases of solar cycles #19, #21 and #22, respectively. Similarly, the ENSO variability is directly responsible for some apparent correlations with the 11-yr sunspot cycle over the century, the most recent La Niña phase being a relevant example as it occurred during the descending phase of solar cycle #23.
Multiple causes should also be considered when studying other individual forcings. Indeed, several authors showed that a correct evaluation of the climatic impact of the 1991 Pinatubo eruption should account for the global temperature modulation by ENSO (Soden et al., 2002;Robock, 2003;Hansen et al., 2005).
Besides the global temperature record, it is also possible to further constrain the attribution by considering the spatial and vertical patterns, which are distinct for volcanism, ENSO, GHG and solar forcing (e.g., Lean and Rind, 2008, who calculated spatial correlations to identify these signatures over the last century and Lean (2010) who presented vertical patterns over the past decades). The solar influence is a uniform radiative forcing and it is still unclear what precise mechanisms are responsible for the observed solar patterns that are much less contrasted than those of ENSO, for example. Climate modelling is another way to study and understand the regional expression of the solar imprint on climate. For example, Shindell et al. (2001) suggested that the solar forcing should perturb the North Atlantic Oscillation (NAO). Woollings et al. (2010) proposed that solar signatures resembling the NAO are indeed present in Eurasian winter climate series.
The studies of LMKC and KLMC focus on the past two centuries. Over this period, the sunspot record is characterized by a long-term modulation of the 11-yr sunspot cycle. This is expressed as two prolonged solar minima, broadly equivalent to the famous Maunder Minimum between 1645 Clim. Past, 6, 745-758, 2010 www.clim-past.net/6/745/2010/ and 1715 (Eddy, 1976). These minima occurred during the intervals 1795-1830 (Dalton Minimum) and 1880-1920 (Modern Minimum) as evidenced with various solar indicators: sunspots (Hoyt and Schatten, 1998), aurorae (Silverman, 1992), aa geomagnetic index , cosmogenic nuclides (Delaygue and . However, these two time periods also include some of the largest volcanic eruptions ever recorded in history. The first period comprises the cold decade linked to the Tambora (1815) and the 1809 stratospheric eruption (Cole-Dai et al., 2009), whereas the second phase includes a series of major eruptions starting with the Krakatoa in 1883 and ending with Mt Katmai in 1912 (Robock, 2000). Climate modelling allows to quantify the collective impact of these forcings in order to explain the temperature historical record of the past few centuries (e.g., see the modeldata compilation in IPCC AR4 Sect. 6.6.3.4 with Fig. 6.13 and 6.14 in Solomon et al. (2007), (http://www.ipcc.ch/ publications and data/ar4/wg1/en/figure-6-14.html), or the more recent paper by Gao et al. (2008) and the study by Wagner and Zorita, 2005). The Northern Hemisphere temperature drops corresponding to the Dalton (0.2-0.3 • C) and Modern solar minima (0.1-0.2 • C) are partly linked to an enhanced volcanic forcing (see Hegerl et al., 2007b, and references herein). This implies that the attempt by LMKC and KLMC at studying the Sun-climate relationship cannot be performed with a simple approach that omits the influence of volcanic eruptions.
A further oversimplified aspect of this approach is the use of the raw sunspot record to distinguish two types of periods referred to as High and Low phases. Indeed, recent studies on solar parameters indicate that the sunspot number is not linearly coupled to solar forcing (Wang et al., 2005). This is illustrated by the last two solar cycles #22 and #23 for which the sunspot maxima yield very different values, whereas the total solar irradiance (TSI) values are indistinguishable. By contrast, the last two sunspot minima are similar in spot numbers, but the TSI record shows a decreasing trend (Fröhlich, 2009). This complexity led several authors to reconstruct the TSI by using empirical models taking into account different types of solar features such as sunspots and faculae (from the seminal paper by Foukal and Lean (1990) to the recent review by Lean (2010), who wrote "terrestrial studies are no longer relegated to using geophysically meaningless sunspot numbers a proxy for solar irradiance"). The variety of the TSI reconstructions is illustrated by Fig. 7 of Gray et al. (2010) compiling 8 different reconstructions for the past 3 to 4 centuries. These TSI curves significantly differ in their long-term trends and structures linked to the 11-yr and longer cycles. Using these published curves would obviously have an impact on the statistical analysis and comparison with temperatures.

Reliability of climate series
To study the evolution of temperatures since the 19th century, many long instrumental climate records are available and can provide useful information in climate research. These datasets are essential to describe the recent past climate, the detection and the attribution of climate change at a regional scale, and the validation of climate models.
Homogeneity of these long instrumental data series (up to 300 years in some cases) has been studied because of the interest in describing long-term variations in climate. A homogeneous climate time series is defined as one where variations are caused only by variations in weather and climate (Conrad and Pollack, 1950). But in most cases, these series are altered by changes in the measurement conditions, such as evolution of the instrumentation, relocation of the measurement site, modification of the surroundings, instrumental inaccuracies, poor installation, and changes in observational or calculation rules. In many cases, such changes are not recorded in the archives, which are often incomplete. These modifications, thereafter called inhomogeneities, manifest themselves as a shift in the mean that can be sudden (break point or change point), or gradual. Moreover spurious observations are frequent. As the artificial shifts often have the same magnitude as the climate signal, such as long-term variations, trends or cycles, a direct analysis of the raw data series might lead to wrong conclusions about climate evolution. Therefore, it is important to remove the inhomogeneities or at least to determine the error they may cause, as clearly stated in Aguilar et al. (2003).
These problems are not anecdotal. During the construction of the HISTALP precipitation dataset (Auer et al., 2005), one break could be detected on average every 23rd year in a series of 136 years. A total of 192 precipitation series were processed, and none of them could be considered free of inhomogeneities. For other elements, e.g. sunshine duration, the average homogeneous subinterval is even shorter (Auer et al., 2007). Della-Marta et al. (2004) showed that each of the 99 annual temperature records in Australia high quality dataset required five to six adjustments throughout the 100year record. Caussinus and Mestre (2004) found no reliable series within a set of 70 maximum and minimum long French temperature series covering the 20th century, each series being affected on average by four to five significant changes. As a result, non corrected series were strongly contaminated by inhomogeneities, and exhibited trends ranging from −3 to +3 • C per century. Thus the detection and correction of these inhomogeneities are absolutely necessary before any reliable climate study can be based on the instrumental series.

ECA&D dataset
The ECA&D dataset and metadata are freely available through ECA&D web interface (http://eca.knmi.nl). The temperatures used by LMKC are three daily series of maximum (TX) and minimum (TN) temperatures collected in Praha since 1775, Bologna since 1814 and Uccle since 1833. Owing to policy changes at the Belgium Met. Office (KMI), the Uccle data were no longer available when our study started. Since the density of available series is poor, ECA&D team has chosen to test the quality of the series through "absolute" testing described by Wijngaard et al. (2003), without using the relative homogeneity principle described below. Although this procedure leads to poorer detection capabilities, more than 94% of the stations are flagged as "doubtful" or "suspect" over the period 1900-1999 (Wijngaard et al., 2003). This is not surprising, given the generally observed frequency of inhomogeneities in climate series.
As an example, a simple plot of the difference between two nearby temperature series in the Netherlands, from Maastricht and DeBilt, distant 145 km, (see Fig. 1 Praha, Uccle and Bologna are among the "suspect" stations (see Table 1) extracted from ECA&D website and this contradicts LMKC who mention those series as having the   highest quality code in ECA&D, for both TN and TX temperatures. Notice that the test is based on "blended" series where gaps are filled with synoptic observations or data interpolated from nearby stations, but this is not affecting Praha, Bologna and Uccle which exhibit complete series over the 20th century. The lack of homogeneity over the 20th century is, of course, a serious warning about the quality of data over the 18th and 19th centuries. Bologna temperature series exhibits a clear artefact, larger than 2 • C between 1865 and 1880, as shown in Fig. 2. This strange feature is acknowledged in LMKC: On the other hand, the two TN and TX curves at the other two stations differ significantly, for instance from 1865 to 1880 in Bologna, when a large positive anomaly of 2.1 • C lasting 15 years is seen in TX and not in TN; we have no evidence of humaninduced changes that would lead us to consider this feature as an artefact.
After checking Bologna metadata, we found that in 1867 the "Grindel" thermometer, in Réaumur scale, read four times a day at 9 a.m., 12 p.m., 3 p.m. and 9 p.m., was changed to a "Milano" min-max thermometer in Celsius scale. In 1881, the thermometers were relocated to a different place (Michele Brunetti, CNR-ISAC, personal communication,  quoting Capra, 1939). The 1867 change is listed in the ECA&D metadata (http://eca.knmi.nl/utils/stationdetail. php?stationid=169). LMKC state that: It is a general observation that one must trust the way ancient observers did the maximum they thought possible to obtain the best data Of course, the observers did the best they could, but this does not ensure that the data are reliable.
To check homogeneity of Bologna series, we use the relative homogeneity principle (Conrad and Pollack, 1950): since the climate signal is mostly undetermined and nonstationary, it has, as far as possible, to be removed to reveal outliers or changes in measurement conditions. Bologna series is compared with neighbouring series by calculating annual differences, evidencing artificial changes, since difference series are weakly affected by climate variations. It is often assumed that noise within those differences is normal, independent, and that most of the artificial changes are described by step-like functions which typically alter only the average value (Caussinus and Mestre, 2004). These differences are then tested for discontinuities, using a dynamic programming algorithm (Hawkins, 2001) and an adapted penalized likelihood criterion (Caussinus and Lyazrhi, 1997). If a detected change-point is preserved throughout the set of comparisons of a candidate station with its neighbours, it can be attributed to this candidate station and the corresponding series can be corrected, estimating break amplitude by standard least-squares techniques.
When Bologna maximum temperature series is compared to its ECA&D neighbours (see Fig. 3), artefacts clearly occur around 1919,[1996][1997]2001 (around 1 • C in amplitude), and maybe at other dates, but, due to insufficient station den-sity, the noise is high (standard deviation around 0.35 • C), resulting into poorer detection.
According to Michele Brunetti, quoting Osservatorio della Regia Università di Bologna (1915), the 1915 bulletin mentions a change in thermometers position (also mentioned by ECA&D). For the most recent part (from 1979 to about 2000) the data come from the former National Hydrographic Service. This service has been dismantled after 2000 and the network has been scattered among the different regional environmental agencies. Many stations were relocated, explaining the break in 2001. The change-point around 1997 is not supported by metadata, but it is large enough to be considered as an artefact.
A homogenized version of Bologna monthly series of mean temperature is available. It is an update of that described in Brunetti et al. (2006). This series is used in Sect. 6.
Praha-Klementinum is another historical station located on the top of the Czech National Library in the centre of Praha, for which there are no metadata available on the ECA&D site. There are also not enough nearby stations in the ECA&D datasets, hence the signal to noise ratio is low when applying relative homogeneity procedures and this cannot be taken as a test of quality -Praha temperature series are flagged "suspect" anyway. No homogenized series is presently available for Praha up to our knowledge and establishing one is beyond the scope of this study. Uccle data could not be tested due to their removal from public access on the ECA&D site.
As a conclusion, Le Mouël et al. (2008Mouël et al. ( , 2009) and LMKC results are all based on raw inhomogeneous data, contrary to their claims. This is quite striking, since information about data quality is easily available from ECA&D website.

Praha temperature
Here we use the raw Praha series of minimum and maximum temperature like LMKC, i.e. without concerns of homogenization and we focus purely on the statistical analysis and the significance of the results of LMKC. The Praha series of TX and TN, shown in Fig. 4, was the longest available in the ECA&D dataset until recently. Let us first briefly recall the method used by LMKC. LMKC classify the 21 solar cycles between 1775 and 2005 in two ensembles, of High versus Low activity according to the number of spots relative to the median. The High activity ensemble (H ) includes the following periods (1775-1798, 1834-1856, 1868-1878, 1945-2005) and the Low activity ensemble (L) includes the following periods (1799-1833, 1857-1867, 1879-1944).
It is important to notice that the last 50 years of the dataset are entirely contained within the H ensemble. Over this period an indisputable forcing by the increase of GHG has become prominent. The fact that this forcing is of anthropogenic nature is not even important in the attribution study. The crucial point is that it must be taken into account in any attempt to extract the solar component The first step in LMKC is to calculate a 21-day moving average of the temperatures over the whole dataset, denoted as T i,j , where i is the calendar day of the year and j is the year. Based on the clustering of N H and N L years, respectively, into the H and L ensembles, LMKC then calculate the daily difference T S i between the average low-pass filtered temperatures over the H and L ensembles, which we denote as solar shift in the sequel: The two panels of Fig. 5 show the solar shift for the TX and TN series of Praha. The two curves are identical, up to irrelevant details, to the two curves shown in Fig. 4a   LMKC. Since the 21-day average commutes with the composite operation of the solar shift, the average could be performed with identical result on the solar shift calculated for unfiltered daily data.
The unbiased estimate (σ H i ) 2 of the variance of 21-day averages over ensemble H is given by: where T H i is the average of T i,j over the H ensemble. A similar expression holds for the L ensemble. Since the successive years can be considered as independent realisations for 21-day averaged temperatures, the unbiased estimate of the variance of the solar shift is provided by the "pooled variance" formula (Weatherburn, 1961, Sect. 88 Equation (3) is obtained under the sole hypothesis that the two ensembles H and L have equal true variance σ 2 i but possibly different means for the 21-day averaged temperatures. Then the true variance of the solar shift is σ 2 i (1/N H +1/N L ) and the first factor on the r.h.s. of (3) is an unbiased estimate of σ 2 i . Under the null hypothesis that the true mean value of the solar shift is zero, the variable t = T S i /σ S i obeys a Student law A(t,ν) with ν = N H + N L − 2 degrees of freedom (Weatherburn, 1961, Sect. 88). The two-sided 90% confidence is delimited by the interval [−aσ S i ,aσ S i ] where a is the quantile 0.95 of the Student law, that is a = 1.65···. This interval is plotted in Fig. 5 for TX and TN. With the number of degrees of freedom used in this study, the Student law is hardly distinguishable from the limit normal law.
The definition of the confidence interval shown by LMKC in their Fig. 4a was not given. After trials and an exchange with the leading author of LMKC we deduced that the error estimate is based on a biased estimate of the variance of the daily fluctuations among all the days contributing to an average value T H i , that is, for the ensemble H : where T l,j is the daily temperature at day l and year j . This expression can also be written as that is as the sum of the daily squared fluctuations within the 21-day intervals and, up to statistical bias, the variance of the average. The statistical error is then calculated in LKMC as: The region enclosed by ±σ LKMC i is shown in gray in Fig. 5 and is visually identical to the region bounded by thin lines in Fig. 4a of LMKC. It is obviously much smaller than our estimate of the confidence interval.
There are two main reasons for this discrepancy.
1. The first one is that LMKC assume that the daily temperature fluctuations are independent. Would it be the case, the mean variance of the 21-day averages would be much smaller and the two estimates (σ LKMC i ) 2 and (σ S i ) 2 would coincide, up to statistical bias. In fact, we would have σ H i ≈ 1 √ 21 σ H i . This assumption, however, is incorrect as it is well known that daily temperatures are correlated over several days. In Le Mouël et al. (2009), the daily fluctuations are represented by an AR(1) process with a correlation of the order of 0.85 over two successive days. Our estimate of the integral scale of the auto-correlation of the TN or TX daily temperature in Praha, after removal of the mean annual cycle, is about 9 days as shown in the Supplement. Hence the number of effective degrees of freedom is about 9 times smaller than estimated by LKMC and consequently the estimated standard deviation of the ensemble average is about three times larger. See below for a more accurate and independent estimate.

The ±σ interval shown in LMKC is a 68% confidence
interval, which means that under a Gaussian condition, 32% of the data can be outside this interval without being statistically significant. The standard width for a two-sided confidence interval is 90% which leaves two sides of 5% each and which is about 1.65 times larger.
Hence, considering these two factors together, LMKC underestimates the confidence interval by about a factor 5. This explains the discrepancy between LMKC and our estimate in Fig. 5. In order to check further the correctness of our result, we calculate also the confidence interval by non-parametric random permutation tests (Good, 2005), a totally independent method. More precisely, we test the significance of the solar shift by performing random permutations of full years within the 21-day filtered temperature series. Each permutation generates a new temperature series for which we calculate the solar shift, for the same ensembles H and L which now contain a random set of years. In this way we can estimate the distribution of the solar shift under the null hypothesis that all years are statistically undistinguishable. After doing this over 10 000 drawings, the distribution of T S i is ordered for each day and the 5% and 95% quantiles of this distribution are shown on Fig. 5. It is visible that this estimate of the two-sided confidence interval falls almost exactly over our previous estimate of the confidence interval based on Eq. (3).
The last step of our demonstration is to show directly the effect of temporal correlation on the 21-day averages. We perform an independent random permutation of the years for each day of the unfiltered temperature series within each ensemble H and L. In this way, we build a decorrelated series which has the same solar shift as the true temperature series but has lost any daily temperature correlation. We then proceed to calculate 21-day filtered data and the variance of the solar shift using Eq. (3). The new standard deviation, which is on the average 2.7 times smaller than the one obtained from the true series (for both TN and TX), is plotted as dashed lines in Fig. 5 LMKC. This result shows that the variance of 21-day filtered data is much smaller for the decorrelated series than for the true temperature series and fully corroborates the above discussion. It validates our hypothesis that the 21-day averages can be considered as independent variables over successive years and that the oversampling of daily fluctuations by LMKC leads to underestimate the solar shift standard deviation by a factor 2.7. The factor 1/ √ 21 used by LMKC in Eq. (4) is correct for the decorrelated series but not for the true temperature series. In other words, the estimate of the solar shift variance by LMKC would be valid on a hypothetical planet with temporally uncorrelated daily temperature series but not on the Earth.
Comparing now the TN and TX solar shift with the confidence interval, we see that the high level of significance claimed by LMKC is not supported by the data. It appears that the TN curve is almost entirely contained within the confidence interval and hence that the null hypothesis of zero solar shift is not rejected for the minimum temperature. The TX curve is mainly contained within the boundaries but is also above the upper 95% boundary much more than 5% of the time. Hence we can infer that it rejects the null hypothesis and that the solar shift of maximum temperature is significantly positive, at least over some part of the year.
It is necessary here to recall that the last period of the H ensemble, which accounts for about half of this ensemble, coincides with a period associated with anthropogenic forcing and that the spatial response of surface temperature to solar forcing resembles the response due to anthropogenic greenhouse gas forcing (Hegerl et al., 2007b, Sect. 9.2.3). It is thus expected that the anthropogenic forcing contributes to the positive signal of the solar shift and cannot be separated. The pure effect of solar variation can only be estimated by removing this period to eliminate the alternative hypothesis that the solar shift is only due to the anthropogenic forcing. LMKC recognize this problem and define several truncated datasets in this purpose, but they fail again to draw a conclusion due to the underestimation of the confidence interval. Here, we will consider only the P-IV dataset in LMKC terminology, in which the last five solar cycles (i.e. the period after 1954) are removed. The remaining dataset preserves six cycles of high solar activity in the H ensemble and the ten cycles of low solar activity are left unchanged in the L ensemble. The upper row of Fig. 6 shows the solar shift for the TN and TX temperatures and the 90% two-sided confidence interval calculated by random perturbation test as in Fig. 5. Both curves are mostly located within the confidence interval and the proportion of points outside the interval is hardly larger than 10%, which is non significant. The significance of the solar shift can be further estimated by calculating the p-value of the Student t-test for the solar shift under the null hypothesis that it does not statistically differ from zero. The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis. The lower panel of Fig. 6   and TX series. Both curves exhibit mostly high values, only a few points are below p = 0.05 and none below p = 0.01. We conclude that the solar shift does not differ significantly from the difference between two random samples in both cases.
In the Supplement, we have performed similar calculations on daily data without filtering and replacing the 21-day filter by 11-day and 41-day filters. We have also applied an independent Kolmogorov-Smirnov test. All these calculations reach the same conclusion that the solar shift is not statistically different from zero at any time of the year.

Bologna temperature
The temperature series for Bologna are shown in Fig. 7. The large bump seen in Fig. 2 is only visible on the TX series. It is intriguing that, although the daily fluctuations (not shown) of TX and TN series are well correlated, the decadal variations are badly correlated over the record, unlike Praha (see Fig. 4). This is a fairly strong indication of inhomogeneities.
Clim. Past, 6, 745-758, 2010 www.clim-past.net/6/745/2010/  The red curve in Fig. 7 is the average of TX and TN temperature and the black curve is the homogenized series of mean temperature after Brunetti et al. (2006). The analysis of the Bologna series can be conducted in the same way as for Praha and is fully described in the Supplement. We summarize here the main results. The confidence interval is again much larger than the error interval shown in Fig. 4b of LMKC. The difference between TN and TX solar shifts is more pronounced than for Praha. When the whole dataset is used, the solar shift for TX remains above the 90% confidence interval for most of the year while the curves for TN or the average temperature lay almost entirely within the confidence interval. When the reduced P-IV dataset is used, removing all the years after 1954, the TX solar shift still exceeds the confidence interval, but only for half of the year, while the TN solar shift still stays mainly within the confidence interval. This TX correlation is, however, highly questionable because of the spurious features in the Bologna series. In particular the positive bump of about two degrees between 1867 and 1881 which occurs only in the TX series coincides with an isolated high solar cycle and contributes strongly to the solar shift. Removal of the cycle #11 reduces the portion of the TX solar shift that offsets the confidence interval to 11% and thus it is not significant. However, only three solar cycles are preserved in the H ensemble in that case.
Since a homogenized series of mean monthly temperatures is available for Bologna, we analyse this series here. The procedure remains essentially the same except that the daily 21day moving averages are replaced by 12 monthly averages. The composite calculations of the solar shift are performed for each monthly mean in the same way as previously for the daily data.
The results exhibit very clear difference according to whether the anthropogenic forcing period is taken into account or not. Bologna solar shift for homogenized mean monthly temperature for the whole dataset (solid red) and P-IV dataset (solid blue). Boundary of the 90% confidence interval from the estimated variance, for the whole dataset (dashed red) and the P-IV dataset (dashed blue). Confidence interval according to random perturbation test for the whole dataset (red area) and the P-IV dataset (blue area). Lower panel: p-value of the Student t-test for the whole dataset (red) and the P-IV dataset (blue).

Fig. 8.
Upper panel: bologna solar shift for homogenized mean monthly temperature for the whole dataset (solid red) and P-IV dataset (solid blue). Boundary of the 90% confidence interval from the estimated variance, for the whole dataset (dashed red) and the P-IV dataset (dashed blue). Confidence interval according to random perturbation test for the whole dataset (red area) and the P-IV dataset (blue area). Lower panel: p-value of the Student t-test for the whole dataset (red) and the P-IV dataset (blue).
always above the confidence interval for the homogenized series when all the years are used and the p-value of the student t-test is under 0.02 for 7 months, thus demonstrating a very significant signal. However, Fig. 8 also shows that this feature fully disappears when the five last solar cycles are discarded and the P-IV dataset is used. The solar shift is now entirely within the confidence interval and the p-value stays above 0.05 for the whole year except in June.

Statistical tests by KMLC
KMLC discuss other tests of the results of LMKC. They perform essentially random perturbation tests using the Kolmogorov-Smirnov distance and find that the significance is high in most cases. This work could be discussed in more detail but it can be said safely that it does not contradict our results for the following reasons. The first reason is that the tests are only applied to the whole dataset except for a single instance in the supplement www.clim-past.net/6/745/2010/ Clim. Past, 6, 745-758, 2010 of KLMC where the P-IV dataset is considered. It is therefore not important to discuss whether the tests are technically valid or not, since they do not distinguish the anthropogenic forcing from the solar forcing and are irrelevant in any case. We have shown in Sects. 5 and 6 that the solar shift may be statistically significant over the whole dataset in some cases, but it is never significant when the last 50 years of the series and the interference with the anthropogenic forcing are removed. When P-IV dataset is considered, KLMC is in qualitative agreement with our results, namely weak or no significance for Praha temperatures and significance for the Bologna non homogenized temperatures. The level of significance reached in this latter case is very high, of the order of 99.9% according to KLMC.
Our claim is that the high significance values found in KLMC, in particular for the P-IV datasets, are due to a flawed usage of statistical tests. The basic procedure of KLMC is based on the Kolmogorov distance λ X {H,L} between H and L ensembles calculated for each variable X among a list that includes TN and TX temperatures, the difference TX-TN and the temporal derivatives of these quantities. The significance of λ X {H,L} against the null hypothesis that H and L ensembles are samples of the same distribution is then tested by random perturbation tests. The p-value for X is calculated as the proportion of random perturbations P for which λ X {P } > λ X {H,L}. However, KLMC extend incorrectly this procedure to the multivariate case by calculating the proportion of random perturbations for which the above inequality is satisfied for all variables X within a set of three or six variables. This is equivalent to perform three or six times a univariate test to reject the null hypothesis and to interpret the rate of success, without correction, in the same way as for a univariate test. This multiple testing procedure produces considerable overestimation of the significance. For instance, if the univariate p-value for all six variables is 0.4, which is hardly significant, the multivariate p-value of the combined test is 0.004. Such values are indeed found in table SM3 of KLMC, and are obviously meaningless.
As mentioned above, a plain Kolmogorov-Smirnov test has been performed over each calendar day, in addition to the Student t-test (see the Supplement). The p-values for Student t-test and the Kolmogorov-Smirnov test agree very well even if the second ones are noisier than the first ones due to the limited size of the dataset. The Student t-test is actually the standard test for statistical significance of the difference between two averages; using the Kolmogorov-Smirnov test only introduces unnecessary complication in the matter.

Conclusions
We have shown that the studies by LMKC and KLMC can be criticized on several important points.
1. Solar forcing cannot be considered alone without paying attention to other natural or anthropogenic forcings that may interfere in the climate system. In particular, the coincidence between high solar activity and anthropogenic forcing during the last 50 years of the 20th century invalidates any empirical proof of multi-decadal solar influence that does not take this overlap into account. It has also been observed that sunspot count alone is a poor indicator of solar activity.
2. The long temperature series available from a number of weather stations should not be treated as homogeneous and calibrated datasets. Ignoring this fact may lead to spurious results and interpretations. A number of methods have been developed to circumvent this difficulty and generate homogenized dataset. Their application is often a lengthy and cumbersome task but this is a necessary step in data mining. LMKC and KLMC have used raw datasets that they present as the "highest quality data" without taking notice of the homogenization checks posted on the ECA&D site.
3. The daily temperatures cannot be treated as a series of independent drawings of some random variable. By using a 68% confidence interval and by neglecting autocorrelation of daily temperatures, LMKC strongly underestimate the 90% confidence interval of the solar shift by about a factor 5. When this error is corrected, and when the last 50 years of the 20th century are discarded, the temperature difference between active and non active solar periods (as defined from sunspot number) is never statistically different from zero. The high levels of significance found in KLMC are due to a combination of the overlap between solar and anthropogenic forcing and to spurious overestimate of significance by multiple testing.
Our unequivocal conclusion is that the results of LMKC and KLMC, claiming a strong signature of solar influence on local temperature records, with amplitude up to 1 • C, are invalid.
This result does not preclude a solar-climate influence at a larger spatial scale. In the Supplement, we have calculated the solar shift for the northern hemispheric mean of the Had-CRUT3 dataset (Brohan et al., 2006). We find a positive significant signal of the order of 0.1 • C during summer which is consistent with previous results . However, this result is weak evidence since only three cycles remain in the H ensemble after removal of the second half of 20th century and all other forcings (e.g. volcanism) that may interfere are neglected.
Clim. Past, 6, 745-758, 2010 www.clim-past.net/6/745/2010/ Many efforts have been devoted recently to attribute climate variations to the various forcings acting on the climate system, based on studying data and model simulations (Shindell et al., 1999;Stott, 2003;Stott et al., 2006;Huntingford et al., 2006;White, 2006;Hegerl et al., 2007a,b;Lean and Rind, 2008;Meehl et al., 2009;Lean, 2010;Cahalan et al., 2010). All concur to find that the response to solar decadal variations accounts for variations of the order of 0.10 ± 0.05 • C of the mean surface temperature with complex, but so far badly characterized, regional signature. These variations have modulated the anthropogenically induced global warming, along with volcanic eruptions and internal modes like ENSO, and will certainly continue to do so in the future (Lean and Rind, 2009). There are many uncertainties on how the various parts of the solar spectrum are modulated (Harder et al., 2009) and how to reconstruct the past history of the solar irradiance (Fröhlich, 2009;Lean, 2010). There is also a need to improve our modelling ability to reproduce solar induced processes, e.g. in the multiband approximation of the short wave spectrum and in the representation of stratospheric processes and stratospheretroposphere coupling.
Other processes, like the role of cosmic rays have been proposed to establish a strong link between solar variations and climate through ion-induced particle production (Bondo et al., 2010) or conduction currents (Tinsley et al., 2000). Although, this suggestion should not be discarded at first, and is worth further study, the underlying physics is still poorly understood and the observed empirical evidence (Svensmark et al., 2009) is seriously challenged by other studies (Laken et al., 2009;Calogovic et al., 2010;Kulmala et al., 2010) who conclude the absence of relation between cosmic rays, aerosols and clouds. This does not offer, anyway, a possibility to compensate the anthropogenic forcing since no trend has been observed for cosmic rays over the last decades (Bard and Delaygue, 2008). For a more detailed discussion, see Gray et al. (2010).
Progresses in deciphering the relationship between solar variations and climate will arise from confronting the best available data with the best models of the climate systems that represent our state of understanding. Careful data mining and processing is required to enlighten this matter.