Early ship-based upper-air data and comparison with the Twentieth Century Reanalysis

Extension of 3-D atmospheric data products back into the past is desirable for a wide range of applications. Historical upper-air data are important in this endeavour, particularly in the maritime regions of the tropics and the southern hemisphere, where observations are extremely sparse. Here we present newly digitized and re-evaluated early shipbased upper-air data from two cruises: (1) kite and registering balloon profiles from onboard the ship SMS Planet on a cruise from Europe around South Africa and across the Indian Ocean to the western Pacific in 1906/1907, and (2) shipbased radiosonde data from onboard the MS Schwabenland on a cruise from Europe across the Atlantic to Antarctica and back in 1938/1939. We describe the data and provide estimations of the errors. We compare the data with a recent reanalysis (the Twentieth Century Reanalysis Project, 20CR, Compo et al., 2011) that provides global 3-D data back to the 19th century based on an assimilation of surface pressure data only (plus monthly mean sea-surface temperatures). In cruise (1), the agreement is generally good, but large temperature differences appear during a period with a strong inversion. In cruise (2), after a subset of the data are corrected, close agreement between observations and 20CR is found for geopotential height (GPH) and temperature notwithstanding a likely cold bias of 20CR at the tropopause level. Results are considerably worse for relative humidity, which was reportedly inaccurately measured. Note that comparing 20CR, which has limited skill in the tropical regions, with measurements from ships in remote regions made under sometimes Correspondence to: S. Br̈onnimann (stefan.broennimann@giub.unibe.ch) difficult conditions can be considered a worst case assessment. In view of that fact, the anomaly correlations for temperature of 0.3–0.6 in the lower troposphere in cruise (1) and of 0.5–0.7 for tropospheric temperature and GPH in cruise (2) are considered as promising results. Moreover, they are consistent with the error estimations. The results suggest room for further improvement of data products in remote regions.


Introduction
Reanalysis data sets of the 3-dimensional global atmosphere have become the most widely used data sets in geosciences.They serve numerous scientific communities such as impact modeling, risk management, and basic research in atmospheric and climate science.While data for the past 60 years are available from the popular reanalysis products ERA-40 (Uppala et al., 2005) and NCEP/NCAR (Kalnay et al., 1996), longer data sets are desirable for analyses of extreme weather events, for long-term impact studies, or generally for studies of variability on long time scales (reaching back to the preindustrial era) that require high-resolution data.
In order to extend conventional reanalysis projects further back into the past, more historical observations are needed, in particular upper-air observations.Recently, a compilation of historical upper-air data has been published, the Comprehensive Historical Upper-Air Network (CHUAN, Stickler et al., 2010).While CHUAN comprises a large number of historical profiles, the spatio-temporal coverage is very uneven.In particular, CHUAN contains data from only 5 North Atlantic ocean weather ships, and hence coverage is not good Published by Copernicus Publications on behalf of the European Geosciences Union.over the oceans.In this paper we present early ship-based upper-air data that can be used in future reanalysis projects.
There is another way to obtain 3-dimensional global atmospheric data besides conventional reanalysis.A long reanalysis data set has recently been produced based on an assimilation of only surface and sea-level pressure (SLP) subdaily observations plus monthly sea-surface temperatures (SSTs) prescribed as boundary conditions using an Ensemble Kalman Filter technique.The feasibility of such a reanalysis was previously demonstrated (e.g., Compo et al., 2006;Whitaker et al., 2004).The "Twentieth Century Reanalysis" or 20CR (Compo et al., 2011) provides global, 6hourly, 3-dimensional atmospheric data.We are using version 2 in the following, which reaches back to 1871.In a validation against independent historical upper-air data from land-based stations close agreement was found for geopotential height (GPH) in the troposphere over the northern midlatitudes (Compo et al., 2011).The agreement is worse over the tropics.However, few upper-air data are available for independent validation in the tropics and the southern hemisphere, especially in the maritime regions prior to the 1950s.
Here we present upper-air data from two ship cruises in the tropical and southern ocean in the 1900s and 1930s, respectively, which we have digitized as an extension of the CHUAN data set, for comparison with 20CR, and for use in future reanalysis projects.The data comprise kite and registering balloon profiles from onboard the ship SMS Planet on a cruise from Europe around South Africa and across the Indian Ocean to the western Pacific in 1906/1907, and shipbased radiosonde data from onboard the MS Schwabenland on a cruise from Europe across the Atlantic to Antarctica andback in 1938/1939.The paper is organized as follows.In Sect.2, the data sources, digitizing, and processing steps are described and the comparison strategy is outlined.Results are shown in Sect. 3 and conclusions are drawn in Sect. 4.

SMS Planet
The first cruise analysed in this paper is one of the SMS (Seiner Majestät Schiff, i.e., His Majesty's Ship) "Planet" from Kiel (Germany) to Hong Kong in 1906/1907 (Fig. 1a shows the ship positions).The goal of this cruise was the Bismarck Archipelago, Papua New Guinea (at that time a German colony), where geodetic work was performed.On the cruise, oceanographic and atmosheric measurements were made.Leading aerologists were involved in planning, leading, and evaluating this cruise, the mission was well equipped, and the results are well documented (Reichs-Marine-Amt, 1909).Three platforms were used on the ship: pilot balloons, kites, and four registering balloons (i.e., free flying balloons with graphical registering devices, which need to be recovered after the burst of the balloon).
Although upper-air soundings were still in their infancy, this mission seems to have been on the forefront of research and new concepts were developed onboard (e.g., innovative attempts for controlled separation of registering devices from the balloon).The data comprise mostly temperature and wind but the latter is partly qualitative.Relative humidity is reported quantitatively at the surface and only qualitatively at higher levels.Pressure, which was used for calculating the height, is not given in the upper-air section of the report.Approximately one successful ascent per week was performed.
Ascents and descents were analysed by the ship scientists, but descents were only reported graphically as calibration curves for aneroid barometers refer to decreasing pressure, so that altitudes are less accurate for the descent.The ascent data are given as significant levels, indicating key features of the atmospheric profile.
The published report (Reichs-Marine-Amt, 1909) describes in detail the instruments and measurement procedure and also states the uncertainties.For pressure and temperature, the error is given as <2 mm and <2 • C at all heights (1 mm Hg = 1.33 hPa).The random error of the instrument reading is specified as <1 mm and <0.2 • C, respectively.Concerning systematic errors, there is no mentioning of how radiation and lag errors were treated.All reduction procedures for balloons used on the ship assumed ascent (and descent) velocities of 5 m s −1 .
In this study we use only the data from Kiel to Bird Island, which cover 42 soundings performed over a period of 257 days.Four further profiles would be available, but were omitted in this study because they are separated from the other profiles by a gap of four months due to geodetic work.We only used the data from kites plus the four ascents with registering balloons.
(above mean sea level).In some cases, the top level was extrapolated to the next level, mostly by less than 100 m (maximum was 284 m).Interpolated and extrapolated values were flagged.No corrections were done for the kite data.Since they were often not continuous and flights sometimes were lengthy, it would have been difficult to assume an ascent velocity.Additionally, since (relative) wind is required for kites to fly, we assume that the ventilation was sufficient so that no correction is needed.Note, however, that both radiation and lag errors would tend to lead to a warm bias.
In the four registering balloon profiles, we corrected lag and radiation errors in the same way as described in Brönnimann (2003), assuming a lag coefficient of 15 s and an ascent velocity of 5 m s −1 .Note that we have no source to verify this assumption; it is based solely on our previous experience, which showed these values to be reasonable when no other information is available.
An important part of the comparison concerns the estimation of the errors.A comparison of the kite instrument reading at the start (10 m) and the temperature measured on board the ship (maximum time offset of 1 h) reveals a standard deviation of the difference of 0.86 • C and a warm bias of 0.17 • C. On the one hand, the surface instrument also has an error which is contained in this number, and the maximum time offset adds to the difference.Therefore the error of the kite instrument reading itself might be smaller.On the other hand, flight conditions might be more adverse (including reducing the graphical registration) and hence the error might be larger aloft than at the ground.In fact the difficulty of reading the graphical registration is frequently mentioned.
Based on these considerations we assume that the error of the temperature reading (σ instr ) can be quantified by a normal distribution with standard deviation of 0.9 • C.
In order to estimate the temperature error for a given altitude (significant level) one needs to consider also the pressure error (since pressure is used to calculate height).If we assume, based on the error specified above, a standard deviation of the random pressure error of 1 mm Hg (1.33 hPa), this translates into an additional error σ alt (standard deviation) for temperature of 0.1-0.15• C under most conditions (but it can be larger during strong inversion).To this adds the systematic error of the pressure reading.The error considered so far concerns the readings from the meteographs such as, for instance, the significant points.An additional error is expected from the interpolation.From the graphical depictions of the ascents given in the report (Reichs-Marine-Amt, 1909) one can try to asses this error, but the drawn curves are smoothed.We assumed an error of 0.1-0.2• C (σ interp ).
Assuming that all three errors are independent, we can estimate the random error for temperature at a specified altitude level as which in this case amounts to around 0.87-0.93• C. We adopt the higher number in the following.
In addition to this random error, there is a systematic error due to non-corrected radiation and lag errors (note that almost all ascents were performed during the day).They cannot be quantified exactly, but a range can be given.Near the ground, in case of strong inversions the lag error might lead to a cooling and the radiation error is negligible.However, in the free troposphere, both errors tend to lead to a warming.The upper limit of this warming can be estimated from the corrections normally applied to radiosonde temperatures, which would be around 0.8 • C. Note, however, that the kite errors are expected to be considerably smaller (smaller ascent velocity, better ventilation).Finally, there are other possible systematic errors both in the temperature and pressure measurements, e.g., due to calibration issues or processing of instrument readings.

MS Schwabenland
The Antarctic expedition of the MS ("Motorschiff", i.e., motor vessel) Schwabenland, commissioned by Hermann Göring, served to preserve German interests in the region (Fig. 1b shows the ship positions).Two aircraft that were transported by the ship performed many flights over Antarctica for aerial photogrammetry and cartography.The explored region of Antarctica is called Neuschwabenland (New Swabia) to the present day.On the cruise of the Schwabenland from Europe to Antarctica and back, daily to twice daily radiosonde ascents were made.
Results from these soundings have been published in the literature (e.g., Flohn, 1949), but most of the original data material was reportedly destroyed during the war.According to Flohn (1949) and Regula (1958), two radiosonde types were used: the Lang sonde (Reichsamt für Wetterdienst, 1940) and the marine sonde from the Marineobservatorium Wilhelmshaven (Geelhaar, 1942).Heinz Lange was responsible for the radiosonde ascents from the two systems, aided by a technician for each system.
A major fraction of the data were available to us in processed and re-evaluated form.They were part of a compilation of German radiosonde data that was performed in the www.clim-past.net/7/265/2011/Clim.Past, 7, 265-276, 2011 Comments: The material stems from the research motor vessel Schwabenland and was retrieved on the cruise from the Bay of Biscay to Antarctica.It was available partly in the form of adiabatic papers, partly in the form of significant points.The ascents had to be reconstructed.The departure height (board height) was assumed as 15 m.The material is very good." The source does not specify the sonde type used, but a comparison with the data tabulated in Regula (1958) indicates that it was the material from the Lang sonde.The data from the marine sonde are also tabulated in Regula (1958), but only in abbreviated form (GPH on five pressure levels, humidity and temperature on altitude levels).Only the data from the Lang sonde from Beelitz and Robitzsch (1949) were used in the following.
The data were reworked in the same way as described in Brönnimann (2003), i.e., radiation and lag errors were corrected based on the formulation of Väisälä (1941Väisälä ( , 1949) ) and Raunio (1950) and the published radiation error of the German sonde (most likely the Lang sonde) by Scherhag (1948).Flohn (1949) analysed the radiation error of that data and concluded that the corrections following the approaches of Väisälä and Scherhag are sufficient.Note that it is not known whether a lag correction was originally applied.The manual for the sonde foresees a lag correction, but was published two years later (Reichsamt für Wettterdienst, 1940).In contrast to our previous work (relating to other data from the same source) we also analysed relative humidity, which in the Lang sonde was measured with a hair hygrometer.However, no corrections were made to humidity.Flohn (1949) states that the error in relative humidity in the Schwabenland ascents was very large for low temperatures and low humidities.For more information on the German radiosondes see DFVLR (1982).
In order to estimate the error of the soundings, we used German radiosonde data from approximately the same time period but from a location that is close to the Alps, namely Freiburg i. B., 1940-1942.Note that in earlier work (Brönnimann, 2003) we assumed that the instrument type used in Freiburg i. B. was a Graw H-38; however, we now think that this also was a Lang sonde (Reichsamt für Wetterdienst, 1940) and hence likely the same sonde type as on the MS Schwabenland.
We compared temperature and GPH above the planetary boundary layer at 800, 700, and 600 hPa with station temperature and pressure from mountain sites, namely Säntis (2500 m a.s.l., 140 km away) and Jungfraujoch (3555 m a.s.l., 161 km), respectively.We selected only pairs of observations that were performed within 3 h of each other.Then we performed a linear regression to estimate radiosonde data (predictand) from the corresponding station data (predictor) using one predictor at the time.Finally, we analysed the variance of the residuals.Results are summarized in Table 1.Note that we did not subtract an annual cycle, nor did we include the annual cycle in the regression model.
The agreement between the regression models and the radiosonde data was generally very good, with explained variances of 83% to 96%.This not only points to a high quality of the data, but also is important to justify the applicability of the general approach.Concerning the analysis of the residuals, we assume again that all errors are independent and hence the standard deviation of the residuals σ residuals is the square root of the sum of the variances of the individual error contributions: where σ obs is again the error of the observation (σ obs = √ (σ 2 instr +σ 2 alt + σ 2 interp )), σ stat is that of the station reading and σ rep is that of the representativeness (i.e., the error attributed to the distance in time, space, and due to the comparison of free atmospheric data with surface data).
For the error of the station observation, an expert estimation of the standard deviation of the error is σ stat = 0.3 • C for temperature and σ stat = 0.3 hPa for pressure.
In order to address σ rep , we compared the differences between the mountain stations, again with a regression model, and assumed that the variance of the residuals, after subtracting σ stat of both stations, is a good estimation of σ rep .This is not exactly true as comparing two ground stations does not account for the error of comparing a free atmospheric measurement versus a mountain station.Also, the altitude difference between the stations is larger than between a station and the closest pressure level.Comparing Säntis and Jungfraujoch, for instance, gives σ rep ≈1.96 • C and σ rep ≈1.75 hPa (depending on the choice of x and y) between the stations (see Table 1) Based on these values we estimate σ obs as 1.2 • C and 1.35 hPa, the latter of which was transformed back to geoptential meters using climatological temperature profiles (see Sect. 2.3).Note that the error of representativeness is only a rough estimation that might be too large (because in most cases the distance is smaller than between Säntis and Jungfraujoch) or too small (because the ships' locations may be inaccurate due to digitizing errors), but in any case has a large effect on the derived error of the observations.For relative humidity, we do not know the error and hence do not try to quantify any humidity errors in this paper.
A more direct approach is to analyse temperature at 4 km (as given in Regula, 1958) in pairs of ascents from the Lang sonde and the Marinesonde (44 pairs with a time difference <12 h).Various approaches of extrapolating the standard deviation of their difference to zero time shift suggest an observation error near 1 • C for each sonde, although with a large uncertainty, which is consistent with the above estimates.

Reanalyses
The observation-based data were compared with 20CR (Compo et al., 2011).We used 6-hourly ensemble mean analysis fields of temperature, GPH and relative humidity as well as the ensemble spread (expressed in the form of the ensemble standard deviation, termed σ 20CR ) of these fields.We interpolated the fields to the locations of the ships and chose the standard time closest to the observing time for comparison.In the case of SMS Planet, the temperature data were interpolated onto fixed (geometric) altitude levels in order to match the observations.The error of the interpolation is not exactly known.Note, however, that the spatio-temporal distance of the interpolation of reanalysis data to radiosonde observations (2 • × 2 • grid, 6 hourly analyses), is of the same order as that used in the comparison for the case of Freiburg i. B. in Sect.2.1 ( x 140-160 km, t<3 h).If we assume that the interpolation error is constant in time and space and does not have a seasonal cycle, we can use σ rep also as a conservative approximation for the interpolation error in the reanalysis.
When analyzing the early years of the reanalysis, it is important to consider the locations of the pressure measurements that go into 20CR (note that no other information was assimilated).The locations are shown in Fig. 1 for one sample day in each cruise (18 May 1906 and11 February 1939, respectively).Note that the surface pressure readings from the SMS Planet are included in the historical reanalysis, whereas this is not the case for the observation from the MS Schwabenland.However, in the case of the SMS Planet, there are almost no other pressure data within a few thousand kilometers.In the case of the MS Schwabenland, pressure data were more abundant, but not south of 60 • S. Hence, both cruises sampled very remote regions of the globe with respect to information assimilated into 20CR.
For some of the comparisons, it is advisable to plot the data in the form of anomalies from a common reference.We used NCEP/NCAR reanalysis (NNR) data for this purpose, namely a climatology of daily mean values as a function of the day of year that is given on the website of NOAA/PSD and refers to the period 1968-1996.These data also were interpolated to the ship's location.

Comparison
The data were analysed as absolute values and as deviations from the NNR climatology.In addition to the standard measures (bias, correlation), we also analysed whether the difference between observations and 20CR is compatible with both the ensemble spread and the observation error assuming that the two errors are uncorrelated.Specifically, we calculated the fraction of cases for which the difference between observation and reanalysis is outside ±2 σ diff where Note that the sum of the errors (σ 2 obs + σ 2 rep ) is even more readily comparable to the cases given in Table 1 than the individual error contributions.Note also that σ diff does not cover all sources of errors and uncertainties, as will be discussed below.For instance, biases (both in 20CR and in the observations) are not included.

Comparison for SMS Planet
Figure 2 shows vertical temperature differences between observations and NNR climatology (left), between 20CR and NNR climatology (middle), and between 20CR and observations (right).The corresponding statistics are given in Table 2.The weather during this cruise was rather normal (see Fig. 2 left).The main anomaly features are the very first two profiles, then a series of profiles with rather cool conditions in the lowest 1000 m (#12-20, extratropical South Atlantic and Southern Ocean), and a sequence of profiles with a very strong, high inversion (#21-24, Southern Indian Ocean).The weather log for ascents #12-20 mentions frequently calm conditions aloft, often cloudy, and generally difficult conditions for kite flights.The inversions in ascents #21-24 were reported in the log and concurred with reportedly very dry conditions at these altitudes and sometimes weak winds.At most other times, temperatures were close to climatology.
The 20CR depicts the anomalies during the first two profiles.It also shows the cool, unstable conditions in ascents #17-20 (the southernmost ascents of the cruise).However, the reanalysis does not capture the strong inversions.This affects the mean differences between the two data sets, which would be within ±1 • C without profiles #21-24, but with these profiles reach 2 • C and are statistically significant at most levels.
The correlations (see Table 2) are very high for the observed values of temperature in the lower troposphere but moderate (0.3-0.6 up to 3 km, somewhat less for the 1500 m a.s.l.level which is near the boundary layer top) for the anomalies.Again, this is largely due to the lacking inversion.Without profiles #21-24 anomaly correlations would be on the order of 0.6-0.8(somewhat lower at 1500 m a.s.l.), which would be considered very high even when compared to results by Compo et al. (2011) that were derived from later periods.However, the presence or not of inversions might be an important regional feature.The inversion occurred southeast of Port Elizabeth, South Africa, over the South Indian Ocean.This region is further south than the usual extent of the trade winds and trade wind inversion but might be affected by inversions occasionally.
Figure 3 shows 700 hPa temperature and GPH on 18 May 1906, 12Z, from 20CR.Shown are the ensemble mean (colours and contours) as well as the spread (yellow and dashed lines).Also shown is the observed temperature (interpolated to the geometric altitude of the ensemble mean 700 hPa surface).The difference between observations and 20CR exceeds 10 • C and is clearly outside ±2σ diff .The high pressure system is much further north, near Madagascar, and does not influence the region.The ensemble spread is relatively small.Hence in this case we suggest that 20CR does not capture the feature.
The last row in Table 2 shows the fraction of differences between 20CR and the observations that is outside ±2σ diff .If all errors were included in the estimation of σ diff , a fraction of 5% would indicate that reanalysis and observations are in agreement given the specified errors.However, the inversion issue affects 10% of the profiles.In fact, near the ground the fraction is below 5%, but then the fraction increases to over 10% and hence slightly higher than expected.This might be due to biases.The likely warm bias in the observations is not accounted for in σ diff , and 20CR also might have biases (biases are not covered by the ensemble spread).

MS Schwabenland
Now, we consider the cruise of MS Schwabenland from December 1938to April 1939.In contrast to the SMS Planet, the data from the MS Schwabenland pose particular problems.Figure 4 shows, for two variables (500 hPa GPH and 700 hPa temperature), sonde observations and co-located analysis values from 20CR as well as the anomalies with respect to NNR climatology.Also shown are the corresponding uncertainties 2σ obs and 2σ 20CR , respectively.It is apparent that the observations and 20CR have substantial differences around ascents #50-61.Temperatures in the observations appear to be 10 • C higher, and GPH also is higher.Moreover, this difference is far larger than the errors of the two series.
One could simply omit that data as possibly erroneous.However, the description of the data asserts a high quality (see quote), and there is an obviously high correlation (0.86 for 700 hPa temperature) between anomalies in observations and reanalysis within that series of 12 profiles.The same deviation is also found in the data tabulated in Regula (1958) and it is also found in the data from the Marinesonde.
The nature of the error is difficult to specify.We therefore analysed the vertical profile of observations minus reanalysis (Fig. 5) in order to use the profile shape to attribute the error to an underlying (usually simple) problem (see Grant et al., 2009 for more details).Uncorrected radiation and lag errors can be ruled out as they are far too small to explain a 10 • C difference (although the shape of the lag error seems correct).Other potential problems such as unit errors, a constant offset in temperature or a constant offset in pressure would produce difference profiles with distinctly different shapes.
The very characteristic belly shape of the temperature difference profile (Fig. 5 right, red curve) can be very easily explained, however, by a shift of the temperature profile by a constant altitude.Since some of the profiles were taken from adiabatic diagrams, which usually have altitude as a vertical axis, such an error might occur when copying the data, for instance.We have no direct evidence for such an error.However, the fact that correcting for an assumed constant offset of 1500 m brings the difference profile (Fig. 5 right, purple curve) at all levels to very close agreement with that for the rest of the record (Fig. 5   Red= uncorrected profiles for the ascents #50-61, blue = all other ascents, thick purple = corrected profiles of ascents #50-61 (dashed: no altitude offset at 1000 hPa).Note that the 1000 hPa level was below sea level during ascents #50-61 and hence these values were extrapolated on the original data sheets.
not expect the difference to vanish at all levels.There may be remaining errors in the observations or in the reanalysis.For instance, a cold bias at the tropopause in 20CR might affect the uppermost three levels in Fig. 5. Also, note that the 1000 hPa level was below sea level during all ascents #50-61.The values for this level were hence extrapolated in our sources (but nevertheless should be considered in Fig. 5, in contrast to the rest of the paper, as they are likely a product of the same data processing).Recalculating GPH from the shifted temperature profile (assuming a zero height error for the surface) also removes the vertical structure in the GPH difference profile entirely, but leaves a constant offset (Fig. 5 left, dashed purple curve) arguably because the height offset at 1000 hPa is not zero.Hence, we further assumed a constant offset of 150 gpm at the surface (Fig. 5 left, purple curve).After this correction, the curve fits very well to that for the rest of the record.
Note that the corrections may seem arbitrary, and indeed they are to some extent.Also, by using 20CR as a reference we lose independence.On the other hand, we have determined only two parameters and (largely) remove different biases for 20 variables simultaneously (10 levels, 2 variables per level).Note also that around 200 observation pairs have been used to constrain the two parameters and that the nature of the error is plausible.For these reasons we utilize these corrections although we note that there may be remaining problems.
The comparison of time-height sections for the corrected data as well as the statistical analysis are given in Fig. 6 and Table 3, respectively.Note that in contrast to Fig. 2 (for SMS Planet), we now have pressure as the vertical coordinate, and we also show GPH and relative humidity.Note also that, because of the mismatch between the reported pressure levels and that of the NNR climatology, 800 and 900 hPa cannot be shown in anomaly form (interpolating the climatology to these levels might add additional uncertainties especially since these levels are close to the boundary layer top), but they can be shown in the direct comparison.
There is one visually very prominent feature in this figure, namely very cool temperatures near the tropopause in 20CR.This has been found also in other comparisons and likely points to a cold bias in 20CR (Compo et al., 2011), though historical upper-air data could also be warm biased in the stratosphere.Apart from that, the agreement between reanalysis and observations is very good for temperature and GPH (Table 3).We find anomaly correlations of 0.5-0.7 throughout the troposphere, which is at least as good as expected from Compo et al. (2011).Also, correlations for temperature anomalies are high in the lower stratosphere (they drop at the tropopause level).
Most GPH and temperature differences are within ±2σ diff in the lower to middle troposphere (see also vertical bars in Fig. 4).Around 5-10% are outside the spread, increasing towards the upper troposphere and stratosphere (and also increasing at 1000 hPa).Similar as for the SMS Planet, this might be due to biases, which increase with altitude (arguably in both the radiosonde data and in 20CR).Since biases are not accounted for in σ diff , we consider these results to be consistent with the estimated errors.
Relative humidity is shown in the lowermost panel.Measuring relative humidity with weather balloons is difficult even in the present day and hence we do not expect highquality data from the 1930s (as is also stated in Flohn, 1949).In the lower troposphere, we do find some agreement between the two (anomaly correlations are significant at 1000 hPa, which is consistent with Flohns judgement (1949) of smaller errors with higher temperatures), while huge discrepancies arise in the middle troposphere.As we know nothing about the errors in humidity, we do not estimate σ diff .Evidently, relative humidity remains a challenge in historical data sets.
One of the strongest anomaly features on this cruise occurred in mid-February 1939 (within the corrected interval).Temperature in the lower troposphere increased by 10 • C within a day and then decreased again by the same

Fig. 1 .
Fig. 1.Ship track and positions of upper-air soundings (open circles) for the cruise of the SMS Planet (a) and the MS Schwabenland (b).The dots mark the locations of surface pressure data that were assimilated into the Twentieth Century Reanalysis for 18 May 1906, 12Z (a) and 11 February 1939, 12Z (b).

Fig. 2 .
Fig. 2. Temperature profiles from observations and 20CR for the cruise of the SMS Planet.(left) Anomalies of observations with respect to NNR climatology, (middle) anomalies of 20CR with respect to NNR climatology, (right) difference 20CR minus observations.

Fig. 4 .
Fig.4.500 hPa GPH and 700 hPa temperature from the MS Schwabenland (solid lines) and 20CR (dashed lines) as a function of the ascent number.The top two panels display the data in the form of anomalies from the NNR climatology, the bottom two panels show the raw data.The vertical bars give 2 σ obs (solid) and 2 σ CR (grey dashed) respectively.The shaded region of ascents 50 to 61 highlights the suspected erroneous observations.

Fig. 5 .
Fig. 5. Vertical profile of the difference between observations from the MS Schwabenland and 20CR for GPH (left) and temperature (right).Red= uncorrected profiles for the ascents #50-61, blue = all other ascents, thick purple = corrected profiles of ascents #50-61 (dashed: no altitude offset at 1000 hPa).Note that the 1000 hPa level was below sea level during ascents #50-61 and hence these values were extrapolated on the original data sheets.

Table 1 .
Estimation of the error of the radiosonde observations at Freiburg i. B. based on the comparison with data from the two mountain sites Säntis and Jungfraujoch.Bold numbers are derived from the other numbers in the table, italics refer to expert estimation, normal printing refers to the residuals calculated from regressing pairwise observations.

Table 2 .
Results of the comparison of temperature between observations onboard the SMS Planet and 20CR.T is the mean difference between 20CR and the observations, T anom refers to the anomalies from NNR climatology, r is the correlation coefficient, P stands for probability, and P (| T |>2σ diff ) is the fraction of differences outside the interval ±2σ diff .Bold numbers indicate statistical significance (p-values<0.05for two-sided tests, paired t-test in the case of the bias).

Table 3 .
Results of the comparison of temperature (T ), GPH (Z), and relative humidity (rH) between observations onboard the MS Schwabenland and 20CR.Subscript "anom" refers to the anomalies from NNR climatology, r is the correlation coefficient, P stands for probability, and P (| T | > 2σ diff ) is the fraction of differences outside the interval ±2σ diff .