the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
More is not always better: downscaling climate model outputs from 30 to 5-minute resolution has minimal impact on coherence with Late Quaternary proxies
Abstract. Both proxies and models provide key resources to explore how palaeoenvironmental changes may have impacted diverse biotic communities and cultural processes. Whilst proxies provide the gold standard in reconstructing the local environment, they only provide point estimates for a limited number of locations; on the other hand, models have the potential to afford more extensive and standardised geographic coverage. A key decision when using model outputs is the appropriate geographic resolution to adopt; models are coarse scale, in the order of several arc degrees, and so their outputs are usually downscaled to a higher resolution. Most publicly available model time-series have been downscaled to 30 or 60 arc-minutes, but it is unclear whether such resolution is sufficient, or whether this may homogenise environments and mask the spatial variability that is often the primary subject of analysis. Here, we explore the impact of further downscaling model outputs from 30 to 5 arc-minutes using the delta method, which uses the difference between past and present model data sets to increase spatial resolution of simulations, in order to determine to what extent further downscaling captures climatic trends at the site-level, through direct comparison with proxy reconstructions. We use the output from the HadCM3 Global Circulation model for annual temperature, mean temperature of the warmest quarter, and annual precipitation, which we evaluated against a large empirical dataset of pollen-based reconstructions from across the Northern Hemisphere. Our results demonstrate that, overall, models tend to provide broadly similar accounts of past climate to that obtained from proxy reconstructions, with coherence tending to decline with age. However, our results imply that downscaling to a very fine scale has minimal to no effect on the coherence of model data with pollen records. Optimal spatial resolution is therefore likely to be highly dependent on specific research contexts and questions, with careful consideration required regarding the trade-off between highlighting local-scale variation and increasing potential error.
- Preprint
(3293 KB) - Metadata XML
-
Supplement
(915 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on cp-2024-53', Anonymous Referee #1, 07 Sep 2024
Review of cp-2024-53 More is not always better: downscaling climate model outputs from 30 to 5-minute resolution has minimal impact on coherence with Late Quaternary proxies by Timbrell et al
This paper looks into the comparison between climate models and proxies and to what extent the differences between them could be reduced. The authors use statistical methods to increase the resolution of the model data to make it more comparable to proxy data, which represent local conditions. The conclusion is that even though the downscaled model data has more details the comparison with proxies is not really improved.
Considering the assumptions made and the methods used in the paper I wonder why anyone should expect an improvement of the model data. I suppose the paper can be a valuable contribution if these methods are commonly used in their part of the field. In any case, I think the authors should make it clear that their results apply to one particular type of statistical downscaling. It’s not possible to draw any general conclusions about downscaling from these findings. Especially since the authors completely fails to mention dynamical downscaling.
Dynamical downscaling is known to improve the description of processes in the climate system and improve the description of local climate (e.g. Rummukainen, 2016). Dynamical downscaling is not very common within the field of palaeoclimate, but there are studies, e.g. Strandberg et al., 2011; Russo and Cubash, 2016; Velasquez et al., 2021; Strandberg et al., 2022; Strandberg et al., 2023.
Statistical downscaling is also known to improve local climate data and successfully minimize biases in climate models (e.g. Francois et al., 2020, Berg et al., 2022)
Bias adjustment methods (also more advanced methods like quantile mapping) build on the assumption that the relationship between model and observations is constant. This works for the present and future (coming 100 years or so) climate because climate change is not that large. For palaeoclimates, however, you cannot expect this relationship to hold. You can’t expect the model biases to be the same in the present climate as in the LGM or in the early Eemian. In a climate different from today, and with different topography the weather regimes are not the same as today – and therefore you can’t expect the model biases to be the same as today. If you in addition to these faulty assumptions use a very simplified method that only gives an offset of the model data, then I wonder why you at all expect your method to improve anything. Figure 2 clearly shows that your methods only slightly shifts model data. But you would like your method to also correct trends and variability.
My point here is that the conclusions drawn in the paper are far too general. Statements like: “our results imply that downscaling to a very fine scale has minimal to no effect on the coherence of model data with pollen records.” (l 28-29) are simply wrong. Your conclusions only apply to the methods used in this study, not all varieties of downscaling and bias adjustment.
I think that the authors could be a bit more critical towards proxies. It’s a bit much to call it “golden standard”, and this comes from a modeller who is used to see all problems in models, and less so in proxies. Remember that proxies also have uncertainties. For example, Strandberg et al. (2011) come to the conclusion that the comparison between climate model and proxy data is mostly limited by the large errors bars in proxy data.
I would also like you to think about the distribution between figures in the paper and the supplementary material. The paper doesn’t include so many figures, and some of them are, to be honest, not that informative. At the same time the paper is quite heavy on reference to the supplementary. Perhaps you would like to lift something from the supplementary to the main text? And while you’re at it rework some of the existing figures.
In conclusion, this paper has a very shallow description and discussion of downscaling and bias adjustment methods. This should be expanded. The conclusions should be reformulated to only apply to the methods used in the study, instead of all methods. If this is done, I think that the paper could be accepted (assuming that the methods are actually used in other projects). Otherwise I will recommend rejection.
Comments
L56-57 It could also be worth to mention that climate models also offer a picture that is also consistent across variables, thus giving a more complete picture of the climate.
L60 what do you mean by “observational data” here? Do you mean proxies? In that case, say so. Proxies and observations are different things. If you mean observations, explain why it is relevant to mention here. The rest of the paragraph is about proxies.
L63 “errors” Perhaps it’s better to talk about “differences” since proxies also have errors.
L71 “Different methods” -> “Different statistical methods”. Otherwise you should also mention dynamical downscaling.
Section 2.1 Here, I would like you to explain a bit more. It’s difficult to follow what is done and in which order. Consider a more linear description, like GMC run, bias adjustment, downscaling etc. For example I don’t understand what the Beyer et al simulation is. Is it a GCM run, a modification of the HadCM3 run or something else? Please also give some details about the HadCM3 run, for example regarding resolution and time span.
L123-124 Is this the same simulation as in lines 112-113.
Eq. 1 Please explain what “DM”, “sim”, “raw” and “obs” denotes.
L161 Why do you use “bio01” here and “Tann” elsewere? Use a consistent terminology. I would prefer abbreviations like Tann instead of bio01, because they are easier to understand.
L211-213 If this sentence is the only thing you write about Fig 2, why show it at all? I think it would be worth to describe also the differences between WAPLS and MAT. And differences in variability between models and proxies.
Fig 2 It’s difficult to see the difference between the lines representing models. Consider using colours that are more different from each other, and to use dashes and dots to separate them even more.
Fig 2 How large are the areas shown here? How is the comparison between model and proxies made? Is it one model grid point vs. One proxy data point? If you average model data over a larger area some of the point of downscaling will disappear.
Fig 3 Add units to the panels. Add temperature, precipitation etc to the leftmost panel in every row.
Fig 3 This could be presented much better. The panels are small, the data only covers a part of the panels, the colours are difficult to distinguish. I cannot draw any conclusions from looking at Fig 3. Think about alternative ways to show this. Perhaps you could collect the point in regions and do boxplots show the differences per region. That would give you a quantitative comparison.
L297 Is “predict” the right word here? The proxy data do not predict temperatures.
Fig 4 It’s obvious that Fig 4 shows the effect of the resolution. I’m, however, not sure that it shows the “effects of landscape dynamics”. What do you mean by that. Furthermore, I think you could make your point by showing just one region in one line. This is a lot of figure space for little information.
Fig 4 What do the dots represent?
L322 Is it correct to refer to Fig 4 here?
L334 “Models are also inherently calibrated ...” This is a very general statement that doesn’t apply to all climate models. Pleas specify which models you refer to.
L364 I don’t think this is a question well posed. How do you know that the downscaling is the problem, and not the methods you used to do the downscaling. Again, this is a very general statement that doesn’t apply to all downscaling techniques.
L364-369 I think this is a testament of the poor methods you use.
L376 You have note mentioned that Beyer et al is a climate emulator. Please add this to section 2.1.
L401-403 This is simply wrong. You only show that the downscaling method used in this paper fails. Based on that you should not dismiss all different ways to do downscaling. It would be unfortunate if the community thought that all downscaling is pointless.
Minor comments
L49 missing “(“ somewhere before this “)”
References
Berg, P., Bosshard, T., Yang, W., and Zimmermann, K.: MIdASv0.2.1 – MultI-scale bias AdjuStment, Geosci. Model Dev., 15, 6165–6180, https://doi.org/10.5194/gmd-15-6165-2022, 2022.
François, B., Vrac, M., Cannon, A. J., Robin, Y., and Allard, D.: Multivariate bias corrections of climate simulations: which benefits for which losses?, Earth Syst. Dynam., 11, 537–562, https://doi.org/10.5194/esd-11-537-2020, 2020.
Rummukainen, M., 2016. Added value in regional climate modeling. Wire Clim. Change 7, 145e159. https://doi.org/10.1002/wcc.378, 2016.
Russo, E., Cubasch, U., 2016. Mid-to-late Holocene temperature evolution and atmospheric dynamics over Europe in regional model simulations. Clim. Past 12, 1645e1662. https://doi.org/10.5194/cp-12-1645-2016.
Strandberg, G., Brandefelt, J., Kjellström, E. and Smith, B. 2011: High-resolution regional simulation of last glacial maximum climate over Europe. Tellus 63A, 107-125.DOI: 10.1111/j.1600-0870.2010.00485.x
Strandberg, G., Lindström, J., Poska, A., Zhang, Q., Fyfe, R., Githumbi, E., Kjellström, E., Mazier, F., Nielsen, A.B., Sugita, S., Trondman, A.-K., Woodbridge, J., Gaillard, M.-J., 2022: Mid-Holocene European climate revisited: New high-resolution regional climate model simulations using pollen-based land-cover, Quaternary Science Reviews, Volume 281, 107431, ISSN 0277-3791, https://doi.org/10.1016/j.quascirev.2022.107431.
Strandberg, G., Chen, J., Fyfe, R., Kjellström, E., Lindström, J., Poska, A., Zhang, Q., and Gaillard, M.-J.: Did the Bronze Age deforestation of Europe affect its climate? A regional climate model study using pollen-based land cover reconstructions, Clim. Past, 19, 1507–1530, https://doi.org/10.5194/cp-19-1507-2023, 2023.
Velasquez, P., Kaplan, J.O., Messmer, M., Ludwig, P., Raible, C.C., 2021. The role of land cover in the climate of glacial Europe. Clim. Past 17, 1161e1180. https://doi.org/10.5194/cp-17-1161-2021.
Citation: https://doi.org/10.5194/cp-2024-53-RC1 -
RC2: 'Comment on cp-2024-53', Anonymous Referee #2, 04 Oct 2024
This is a disappointing paper, because the issue of whether and how to downsccale climate-model output is an important one, and even as models achieve ever higher resolutions, the demand for even higher resolution data will remain. This paper attempts to assess the match between a collection of pollen-derived reconstructions and climate-model output downscaled to 30-min and 5-min resolutions. However, the climate-model output is represented by the Beyer et al. (2020a) 30-min data set which itself was produced by debiasing and downscaling HadCM3 model output. There is therefore a big assumption here, then, that the Beyer et al. data is sound, and there were no artefacts generated in the process of its creation. I think a better experimental design would have been to start with actual model output, and to spend more time focusing on the performance of the downscaling and debiasing routines for present-day data. The paper also completely avoids even commenting on other approaches for downscaling, such as dynamic downscaling, and the take-home message, that the target resolution doesn’t matter, could be taken to say “why bother?”
The paper is not well written or produced. The figures don’t work very well, there are missing tables, and it lacks even first-order attempts to explain patterns in the results. Terms like “estimation,” “prediction,” “reconstruction” are used interchangeably, and applied both to the model output and reconstructions.
Line 16: Models also provide physically consistent simulations of multiple climate variables.
Line 21: “model output”
Line 22: I know this the Abstract, but I think the delta method needs to be described in a bit more detail. It’s not the interpolation to a finer spatial resolution that’s important, but the application of the long-term mean differences (present minus paleo usually) to high resolution observed modern data that produces results with greater spatial variability than that provided by the model.
Line 20: Sufficient for what?
Line 49: I’m not sure what “an absolute, linear, and standardized representation” is.
Line 53: “variable nature” Variable in what sense? And I’m not sure what “data … cannot be articulated” means.
Line 57: Replace “Modelled data” by “Model output” or “Model simulations”.
Line 64: I’m not sure what “estimation of ecologies experienced on the ground” means. Are you perhaps referring to applying model output to a species distribution model?
Line 65: This sentence essentially says that the spatial variation of simulated climate is lower than that of real-world climate, which has already been said several times.
Line 69: These two sentences don’t follow. The cost of high-spatial resolution simulations don’t have anything to do with the interpolation approaches discussed in the rest of the paragraph.
Line 77: “delta-downscaling uses as map of local differences …” This would work, but in practice what is usually done is to calculate “experiment minus control” long-term mean differences on the model grid, which are then interpolated and applied to a higher resolution grid of observed present-day climate.
Line 83: I would refer to these as “interpolations” rather than “predictions”.
Line 88: This is the third “gold standard” invocation. Reconstructions can have considerable uncertainty attached to them, arising from multiple sources.
Line 101: “further downscaling” Further from what?
Lines 112-121: If I understand this correctly, you’re using already downscaled model output (Beyer et al., 2020a) as the starting point, and further downscaling it. Wouldn’t it be better to begin with the original HadCM3 output?
Line 118: “National Center”.
Line 127: See line 77 comment.
Line 131: The terms in the equation should be defined. The equation reads like the Line 77 description of the delta method as opposed to the line 127 version. If all of the data were on the same grid, the approaches are in fact identical (as can be seen by rearranging the terms), but what did you actually do? Another issue is that the geographical location, x, is presumably a two-dimensional variable (in longitude and latitude), and so all the equation is illustrating is de-biasing, and not downscaling.
Lines 134-139: How is “GCM drizzle” handled?
Lines 152-158: The interpolation method needs to be better described. It’s implied that an inverse-distance weighted method was used, and that this can induce artefacts. Why was this method used, and not something else, like conservative remapping from the SCRIP package (https://github.com/SCRIP-Project/SCRIP)?
Line 198: “Considering that downscaling to higher resolutions is thought to capture localized climate dynamics…” Statements like this appear several times. I’m not sure that it’s “climate dynamics” that is being captured, but instead just simply spatial (mainly topographic) variations in climate.
Line 204: “These analyses allow us to evaluate both the output of the climate models and the reliability of the proxy data in predicting specific climatic parameters in the past.” How is that possible. To evaluate the climate-model output, one would have to regard the proxy-based reconstructions as true, and to evaluate reliability of the proxy-based reconstructions, the model output would have to be regarded as true. Neither are.
Line 213: “the most divergent variable on average is reconstructed mean annual temperature” This is somewhat of a surprise, given the global scope of the analysis. How does the performance here compare with other large-scale studies that examine present-day climate reconstructed using pollen data.
Line 220: “tends to estimate” But Beyer et al. (2020a) are downscaled simulations.
Lines 220-245: I would expect to see here, or in the very short Section 4, some discussion of the source of the differences.
Section 3.1: Again, I would expect some attempt to explain the spatial variations. There are several sources that I imagine could play a role: spatial variations in the performance of the GCM, variations in the quality of the present-day calibration data for LegacyClimate, variations in the quality of the CRU and WorldClim data, impacts of confounding variables on the pollen-climate relationships.
Fig. 3: The figure is extremely difficult to read. There is a lot of useless white space between panels, and scales are unnecessarily duplicated. Also, I don’t see any data from the Southern Hemisphere (or south of 20N?), which results in even more useless white space. What happened to the graticule over the Pacific? I think a polar-centered projection is fine, but it should fill the frame.
Line 292: “higher resolution models compared to those at relatively lower resolution” This implies multiple models, but line 114 refers to a single HadCM3 model.
Fig. 4: What are the dots? What do you mean by “landscape dynamics”? Is the landscape changing in some way?
Line 299: “… a known bias of transfer functions…” In addition to topographic effects, this bias also arises from “compression” in regression-based calibrations—the fact that the fitted values from less-than-perfect regressions always have lower amplitude than the observed values.
Line 314: “time slice” I think a better term would be “time interval”.
Line 326: The supplemental material I downloaded only contains Table S1.
Line 334: “Models are also inherently calibrated…” If you’re referring to GCMs, they are most definitely not calibrated in the sense that the term is used elsewhere in this paper.
Fig. 5: Labels are unreadable.
Line 347: “Table 2” No Table 2.
Lines 353-362: There is no way to evaluate these statements without the supplementary tables. Also, there’s no attempt to explain the results. An obvious candidate for poor performance of the reconstructions in the MIS 2 interval is low CO2, which, to my understanding was not considered in LegacyClimate.
Line 366: “capture more signal” Jargon.
Line 376: “Beyer et al. (2020a) climate emulator” I don’t understand. Beyer et al. is just downscaled and debiased data. “Climate emulators” are a different thing altogether.
Citation: https://doi.org/10.5194/cp-2024-53-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
354 | 79 | 54 | 487 | 49 | 9 | 10 |
- HTML: 354
- PDF: 79
- XML: 54
- Total: 487
- Supplement: 49
- BibTeX: 9
- EndNote: 10
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Lucy Timbrell
James Blinkhorn
Margherita Colucci
Michela Leonardi
Manuel Chevalier
Matt Grove
Eleanor Scerri
Andrea Manica
Scientists study past climate change using proxies (e.g. pollen) and models. Proxies offer detailed snapshots but are limited in number, while models provide broad coverage but at low resolution. Typically, models are downscaled to 30 arc-minutes, but it’s unclear if this is sufficient. We found that increasing models to 5 arc-minutes does not improve their coherence with climate reconstructed from pollen data. Optimal model resolution depends on research needs, balancing detail with error.
Scientists study past climate change using proxies (e.g. pollen) and models. Proxies...