|Dynamics of the Mediterranean droughts from 850 to 2099 AD in the Community Earth System Model|
General comments (after author revisions):
I appreciate that the authors have tried to respond to many of my comments (e.g., providing some statistical tests in place of visual comparisons of data/results, comparing model simulation output to instrumental/paleo data, etc.). However, there are still some important misinterpretations of the data/figures and/or errors in my opinion.
These issues fall into several general categories:
(1) The comparison of CESM with instrumental/reanalysis data: there is much improvement here (e.g., comparison of geopotential height, SST anomalies in simulations and instrumental/reanalysis), but the authors still don’t show that the model is simulating the timing and magnitude of trends in local rainfall and SOIL/PDSI (e.g., Figure 12 – if the drying trend is in fact a forced signal that is as strong as the CESM appears to indicate it is, it should show up in the instrumental data)- given that the conclusions about ongoing/future forced changes hinge on the model’s ability to do this, I suggest addition of instrumental vs CESM time series data in Fig 12 or a panel c in Figure 3 that shows time series from the model and observations on top of each other (I know the paper/figures are already quite long, so I don’t want the authors to have to add extra figures).
(2) Potentially flawed use of statistical tests: the authors claim to use a Mann-Whitney test to determine if the ‘means’ of the OWDA and CESM PDSI variables are distinguishable and find ‘the means are statistically similar to one another’ - this is a non-sensical test as far as I can tell. First, it looks like the two PDSI time series are normalized to similar time periods (the last millennium), so they should both be centered about a long-term mean of zero, so testing if two time series’ means are statistically different if they are centered at zero is not a test that the model is doing well- the real test seems to me to be in Figure 5a (are drought distributions similar). Finally, the Mann-Whitney test should be applied to test the similarity of distributions of data, not means (as far as I know), but the authors’ wording suggests they are using this test to determine if means of distributions are significantly different throughout the manuscript.
(3) The reported finding that Mediterranean droughts are mainly driven by internal variability in the climate system: the wavelet coherence figure seems to demonstrate that Mediterranean PDSI shows significant coherence with the volcanic forcing time series after large eruptions, so the timing of variability doesn’t look like it’s only due to internal variability. Perhaps there are wet periods following volcanic eruptions, but there is not enough information in the figure caption to determine the phase of the relationship. Additionally, I would think the authors would want to show that the magnitude/duration of droughts in the control run overlap with the magnitude/duration in the forced run (and statistically test if these drought duration distributions are distinguishable), but I don’t see a figure or analysis that shows this (e.g., additional box/whisker plot w CESM control drought durations in Fig. 5a)- the authors do present the mean number of droughts and durations in the control and forced runs in Section 3.3, but showing the actual drought distributions and testing this difference could be much more informative and actually test/support the authors’ assertion that drought (duration) is driven by internal variability.
(4) The drought initiation/termination is quite interesting, but I don’t know what to make of the indices (NAO/ENSO) after they have been redefined to percentiles during non-drought periods- is this standard practice? And how is this information meaningful for interpreting year to year NAO/ENSO as it relates to drought predictability/trajectories? For example, I am unsure as to what the ‘extreme positive’ NAO now means- how much is this index now changed if it is redefined only during non-drought years?
(5) I still find many sentences/phrases hard to decipher, as they are either grammatically incorrect or just hard to interpret- I have noted in line numbers where this is most apparent.
Specific comments in sections/figures/by lines:
Abstract: I see the authors have responded to my previous comment about model limitations and added to the main text, but again, I am left with no sense in the Abstract about any model limitations/bias – given that many readers may only look at figures and the Abstract, it would be informative to insert a phrase/sentence about model strengths and limitations. If the list of model shortcomings as it relates to simulation of drought over the Mediterranean is so long that it can’t be easily summarized in the abstract, that’s a problem I think.
Lines 3-4 ‘Our study indicates…mainly driven by internal variability’ – as I mentioned in the general comments, after seeing the wavelet coherence plots, I think this statement is flawed and needs to be qualified- perhaps the overall duration/severity of droughts is statistically indistinguishable in the control and forced simulations (which isn’t shown as far as I can determine), but PDSI variability and large volcanic eruptions appear to vary coherently at interannual-decadal timescales. To really be able to make the claim that the duration/severity of drought is purely internal, the authors need to actually compare the duration/severity of control run droughts and forced simulation droughts, but I don’t see this comparison anywhere. As it stands, it looks like the CESM shows a sensitivity to volcanic eruptions that the OWDA doesn’t show.
Lines 20-26: I appreciate that the authors acknowledged there are various types of drought, but I disagree with the idea that a meteorological drought that is long enough just becomes the other types of droughts. For example, couldn’t an area become warmer (and receive the same amount of precipitation, experience the same drought frequency), but experience earlier spring melt, more evaporation, and thus hydrologic or agricultural drought? This ag drought would not just be caused be a persistent meteorological drought.
Lines 34-35: ‘The climate of the Mediterranean is characterized as semi-arid with a pronounced annual cycle, thus, high temporal and spatial variability of the availability of water resources’- does a semi-arid climate imply spatial variability? Or temporal variability? I thought it just had to do with the mean climatologic conditions (winter wet, summer dry on average), but I could be wrong.
Lines 38-39: ‘The western and eastern regions show different precipitation regimes’ this makes it sound like the region should be split into two separate regimes for the analysis
Line 44/49: authors define EA-WR and use it, but then ER-WR is used on line 77 (?)
Line 51: suggest ‘response of the Mediterranean climate to ENSO’
Line 71: ‘multi-years long desiccation’ is a bit of an odd phrase- suggest ‘multi-year drought’ or multi-year dry periods’
Lines 112-113: I understand that other cited studies have shown that parts of Europe may be drying using different data sources, but it would be helpful to show the actual time series in instrumental and CESM data earlier in the manuscript, both to illustrate this visually for the reader and to validate the model’s trends/drought sensitivity to warming.
Line 132-133: suggest ‘validate the model simulation used here’ in place of ‘our model’
Line 166-167: ‘The statistical tests to compare the transient to the control simulations are performed with the Mann-Whitney U significance test for the means at a 5% confidence level’ – the authors go on to use this test in the text/figure captions to show that the ‘means are statistically different’- similar to my general comment above, doesn’t this test ask if the distributions are likely different?
Lines 168-169: I’m a bit confused about which years are used and why 5 sets of 89 years are chosen for the transient simulation- the control run is 400 years, so the total number of years doesn’t match in the random draw of the transient vs the control. Also, are the draws of years contiguous (e.g., years 1-89 in a row) or randomly drawn? If they are randomly drawn, what is the sense in using different sets of draws? Please clarify because this approach doesn’t make immediate sense to me.
Lines 171-174: After my last comment in the previous round of reviews, I appreciate the authors tried to split up the time periods from which they remove trends, but I still am not sure why linear trends are removed over these time periods as the figure showing the time series isn’t shown until Figure 12 - it may help to show the reader these time series when discussing the time series and trend removal.
Line 184: ‘some drought metrics’- suggest ‘various’? Or state exactly how many- ‘some’ sounds strange
Lines 242-243: If the model underestimates/simulates 30-50% lower precip than observations, does this bias mean anything for simulation of drought in the region? I ask because the authors mention land-atmosphere feedbacks - if the land is already ‘too dry’ in the model, are there implications for the life cycle of droughts (e.g., is the model somehow unrealistically ‘on the edge’ of setting off land-atmosphere drought feedbacks that wouldn’t occur if the model just simulated a slightly wetter background climate regime that was more realistic?)
Line 245-246: ‘the means are statistically similar to one another’- as I mention in the general comments, this appears to be a non-sensical test- PDSI should be centered around zero, so testing the means are statistically different is not a test that the model is doing well. I am also not sure if the instrumental- model-OWDA comparison was done on time series that were re-normalized to the same time period, or if the distributions from the data normalized to the last millennium are compared to distributions of data normalized to the last 100 years- if the time series are normalized to different time periods, I would expect there could be differences in the mean, etc.
Figure 3 caption:’ The p-value from the t-student test between the summer scPDSI from CESM and OWDA is 0.28’ – what exactly is being tested here? I assume it’s the statistical tests mentioned in the Methods, but please clarify in figure caption as I’m not sure (e.g., that distributions are significantly different or what?
Separately, the figure caption states the ‘red points on each box show the data points’ – so there are only 3-5 data points? If there are <5 data points, then what is the sense in showing box plots that are intended to summarize large numbers of data points (don’t the boxplots just show the minimum, first quartile, median, third quartile, and maximum? So how can 3 points of data meaningfully produce a boxplot?). Please clarify.
Also, in panel (a) the observations are black, then in (b) the colors show different information- this seems unnecessary confusing as there’s no figure legend in the boxplot panel. Can the authors use consistent colors in both panels?
Finally, what time periods are shown in the different parts of panel (b)? Please label on figure and/or describe in caption- for example, Figure 5a is immediately much more clear with the labels above the different sections of boxplots.
Lines 255-256: confusing wording: ‘simulating droughts of few years long, and with longer duration than those from OWDA.’
Line 266-267: ‘tree ring based reconstructions tend to deviate in their spectral behavior’ – what does this mean?
Line 286: confusing wording: ‘despite the fact presents some discrepancy to the observation exist’
Lines 286-287: ‘the model does not significantly underestimate the persistence of multi-year droughts’ – yet my reading of Figure 3b in which the observations are compared to the CESM show that the CESM simulates droughts that are about half the duration as compared to obs (large differences in median, inner quartile range). Perhaps the authors mean CESM simulates longer droughts than the OWDA?
Line 291: suggest ‘by focusing’ (not and focusing)
Lines 293-295: ‘We do not aim to make a direct comparison between the proxies and the model simulation, as this cannot be made due to the different initial conditions’ – this is fair, but according to the wavelet analysis shown later, the CESM PDSI variability appears to line up with large eruptions, whereas they don’t in the OWDA – if we ‘believe’ the model simulation, the PDSI variability could be (partly) forced, meaning they should line up temporally in the proxy data and CESM, right?
Similar comment on lines 304-307: ‘are not mainly driven by external natural forcings’
Line 314: suggest ‘statistically indistinguishable’, not ‘indifferent’ – both here and elsewhere (I think indifferent means ‘unconcerned’ or ‘mediocre’, not indistinguishable)
Lines 329-331- given the timescales of impacts of volcanic forcing (e.g., multi-year cooling, then recovery to ‘normal’, maybe with some impacts on AMO), would you expect to see imprints of volcanic eruptions visually on drought indices that have been smoothed with a 100-year running mean?
Line 334: for figure 6, the authors have shown a wavelet coherence diagram here, with no time series (and no indication which variable is the ‘first’ or ‘second’ variable as they ask the reader to interpret for lead/lag/phase relationship in the caption)- I think it would be helpful to plot the time series of volcanic eruptions below or above the plots so the readers can ‘see’ when the volcanic eruptions occur- for example, I know there are large eruptions ~1257/1258, ~1450, ~1600-1650, and in the early 19th century. All of these time intervals/years happen to show significant coherence with simulated scPDSI variability at ~4-16 yr timescales in figure 6, but this information is not provided for the reader, so unless the reader regularly works with the last millennium forcing data, they may not notice that there are conspicuous areas of significant coherence around large eruptions.
Lines 334-336: ‘are not uniform across the period frequency bands’ - Would we expect the signals of statistically significant coherent variability between drought indices to be uniform across frequency bands? I think a lack of coherence across frequency bands does not show a lack of forcing response, it only shows a lack of forcing response at certain frequencies/periods. This relationship (at longer time periods/lower frequencies) could make sense given the nature of simulated responses to volcanic eruptions in the North Atlantic in the NCAR CESM model. For example, Otto-Bliesner et al. (2016, BAMS, CESM LME documentation paper)- shows that the AMO/AMV is clearly impacted at decadal timescales by volcanic eruptions (see Figures 11 and 12 in Otto-Bliesner).
Lines 339-341: ‘the analysis confirms…not driven by the volcanic eruptions’ – I strongly disagree with this interpretation- the figure seems to show me that the CESM PDSI, and soil variable all show ‘significant’ coherence with the 1258 eruption (and even some coherence with the 15th century, 17th century, and 19th century eruptions for PDSI) – the ‘red’ regions of coherence surrounded by black lines to show significance at ~4-16 year periods is pretty hard to miss.
The information I take away from this figure is that the OWDA does NOT show coherence with eruptions, but the CESM does, which suggests the forcing is implemented wrong in the model, or that OWDA data aren’t picking up on volcanic eruptions.
Lines 343-345: ‘The focus on one indicator is motivated by the fact…’ - I still don’t see any indication that the 10cm soil variable actually reflects what is happening with deeper soil water in the model- as Berg et al. 2017 (GRL) show, 10cm soil water can basically just mimic what is happening in atmospheric-centric variables likes precipitation. For example, as Berg et al. (GRL, 2017) show in their Figure 1 and describe in the text: ‘In contrast, projections of negative changes in total soil moisture are more muted, in both extent and amplitude. Regions of negative changes (e.g., southern U.S. and Central America, northern South America, Mediterranean region, and South Africa) display relative changes of reduced amplitude compared to surface changes.’ – so in fact, I would argue that unless the authors show that the deeper soil moisture column actually shows the same degree of water stress, this statement is not supported by these citations.
Line 362: suggest ‘presents an average of 7.25 droughts per century’- because this is an average, right?
Line 404: suggest ‘in the following section’ (not ‘in the following’)
Line 407: ‘The phases of NAO and ENSO are defined with respect to the non-drought periods: the values below (above 75) percentile of NAO and ENSO during the non-droughts periods are considered as negative (positive) phases of NAO and ENSO respectively (Fig. 9).’ – I am unsure of what to make of this- so the authors redefined standard indices based on non-drought years? What does this do to the time series/what is the reason to do this other than to maximize drought signal of NAO and ENSO? And how is this information meaningful for ‘real world’ NAO or ENSO (e.g., how can this information about NAO/ENSO during drought be used if the indices have to be redefined during non-drought years? I can see how maps of mean differences in drought and non-drought years could make sense, but I don’t know how to interpret the treatment of the indices).
Line 418: Again, is this what the MW-U test is testing (difference in means?)
Line2 425-426: ‘The positive NAO occupies 49% in the initiation years, then it decreases throughout the development of droughts, falling to 29% in the termination years’ – in terms of this being meaningful information for drought prediction or giving information about how these droughts initiate, it sounds like positive NAO occurs almost exactly half of the time at the start of drought, but not traditional NAO, but instead NAO as defined by NAO during non-drought years?
Line 449 vs line 457: ‘transition years’ vs ‘transient years’- suggest consistent usage of terminology, here it makes sense (transition years), but in other locations (e.g., line 457), ‘transient’ is used (and in figure captions I think?)- this change in wording can be confusing, especially because the transient forcing/simulations terminology is also used, so I suggest removing wording that refers to transient as transition drought years, and just consistently use transition. (unless I completely don’t understand and the authors intended there to be a difference)
Line 469-470: About the time series shown in Figure 12 – these would seem to suggest that the Mediterranean region is in a long-term drought relative to the last millennium- the smoothed time series for SOIL never reach pre-industrial moisture levels after ~1860 AD- does this mean that climate change has caused a long-term drought that has lasted for ~150 years? I am not a Mediterranean climate expert, so I plotted GPCCv2018 annual precipitation, as well as Dai NCAR PDSI over the relevant time periods for the Mediterranean region and see no noticeable long-term trend in precipitation, and either a drought or a ‘step function’ in PDSI in the late 20th century (again, no long-term aridification trend as the CESM seems to simulate). Can the authors plot the instrumental time series in the background to show if the model exceeds the envelope of variability in the instrumental data and/or if the trends are present in instrumental data too?
For Figure 13, please define the acronyms ND and D (I assume this is Detrend and Non-Detrended, but this is not explicitly defined)
Line 504: suggest word other than ‘indifference’
Lines 505-507: ‘this result shows that the natural mechanisms associated with droughts remain the same…’ ok, in so far as they are defined by circulation patterns, but what about increases in evapotranspiration/aridification due to increasing temperatures? This sentence basically is contradicted by the next sentence, which states that the mechanisms are anthropogenically driven- can the authors distinguish/clarify? The authors have shown how different drought drivers progress in Figure 11, but couldn’t EV change in the future, thus the droughts would not have ‘natural mechanisms’?
Also, again, there is no precipitation and/or obs-based soil moisture/PDSI shown here- do the observations show the same general trends in terms of long-term aridification? If not, this an important thing to point out, if so, then great- the model is doing well, and this should be noted.
For Figure 7 caption: ‘the regions where the means between the control and transient simulations are statistically not significant at 5% confidence level’? - this seems like the wrong test here- are we interested in the means being the same, or are we interested in where the ‘spread’ from internal variability in the control run is different than the forced run spread?
Line 515: suggest changing wording to ‘although our result shows’ or the sentence is incomplete/comma splice
Lines 525-529: Authors conclude there is no ‘causal connection between volcanic eruptions and dry conditions’, but their wavelet figures indicate that volcanic eruptions are significantly coherent with PDSI and soil moisture variability at ~4-20yr periods around large eruptions. I agree that if we average all the geopotential height patterns during drought in the control and forced simulations, there may be minimal differences, but not all droughts/pluvials occur during eruptions, so any anomalous behavior after eruptions could be ‘averaged out’ by the large numbers of PDSI anomalies (droughts/pluvials) that do not occur after eruptions.
Lines 553-554- These results would be much more believable if the authors showed GPCC/CRU/UDel precipitation on top of the model precipitation time series, and instrumental-based scPDSI on the CESM PDSI time series to show the model is getting the timing and magnitude of trends right: https://psl.noaa.gov/data/gridded/data.pdsi.html
Lines 574-575: again, the authors are choosing to study a region that Berg et al. have shown has 10cm soil moisture that magnifies droughts and does not reflect what is happening in ‘full column’ soil moisture (to ~3m depth) – so bringing up that the authors have used 10cm soil water isn’t really showing that they have got around this problem.