|The new version of this paper is in my opinion still not suitable for publication and still requires some major work to become publishable. This is not, as before, too much related to issues related to inconsistent model-data comparison, but instead to too superficial an analysis of the results presented here. It does not become clear in as how much this paper really provides new insights regarding the evolution of sea ice in a warmer climate relative to existing studies based on, say, CMIP simulations. This paper currently remains very descriptive, rather than providing the reader with any robust results. This fact, however, is not stated in the paper, but instead is hidden behind a language that is very speculative throughout much of the text. I still believe that these simulations can provide new, broadly relevant and interesting insights regarding the evolution of sea ice, but more in-depth analysis is required to extract those from the available data.|
Alternatively, this paper should be shortened significantly and only provide a description of model results. This would then be a helpful reference for these simulations, and it would be made clear to the reader that this is not meant to provide an "assessment" or the like. In that case, the title should be changed to "Description of simulations of Arctic sea ice in the PlioMIP models"
If the authors decided to keep the current scope of this paper I suggest that they consider the following remarks for a possible revised version:
l.5 and section 3.3: I am still not convinced that much can be learned from using CV in the current context. What is the geophysical relevance of CV that makes this measure preferable over simply using ensemble spread? If in a warmer climate all simulations are ice free, but one simulation still has a tiny ice floe of 2 m² lying around somewhere, then CV will be more than 10. But this high value would be totally irrelevant, as is expressed by the geophysically more relevant ensemble spread given by ensemble standard deviation. I disagree in particular with the statement tat standard deviation does not allow one to compare data sets with different mean values (l.64). Why not?
If the authors decide to keep the analysis of CV, it'd be helpful to give geophysical reasons for its relevance - rather than simply stating that others have used this metric before. Please also note that "ensemble spread" is very different from "variability", but currently these terms are used as if they were to describe the same thing.
l.11: "suggesting that the dominant atmospheric and oceanic influences may be different in the [two] simulations": This is one example of the speculative language. All data is there to test this suggestion, so why not do it? In particular since I doubt that this is true.
l.24: The Arctic is only "widely predicted to become seasonally ice free before the end of the 21st century" for a specific evolution of CO2
Introduction in general: This should include some short discussion of what we do know from previous studies on sea-ice ensemble spread, correlations between individual sea-ice metrics and drivers, temporal correlation of sea-ice evolution, generally evolution of sea ice in a warmer climate, etc., which is necessary to allow the reader to identify the open questions that are addressed by the present study.
l.56 leading to Figure 14: I was wondering if some of the results of this study are simply related to the fact that sea-ice extent is used to describe the areal coverage of sea ice, rather than sea-ice area. If in a cold climate sea-ice concentration reduces because of some warming from, say, 90 % to 45 %, sea-ice extent would remain the same, even though the area decreases by 50 %. This then renders the correlation of extent and temperature very weak. Sea-ice extent is only a useful metric when comparing data to observations, since it allows one to account for some observational uncertainty. In the present context, where most of the analysis is only carried out in the model realm, sea-ice area would give much more robust results, in particular given the very low sea-ice concentration that is obtained in the warm climate runs.
Section 3: I found this section unnecessarily long. The reader can simply look at the figures, and doesn't need a detailed description of every single panel. In particular since much of the language remains very vague, repetitive and sometimes contradictive, such as "Most of the models display patterns that are broadly similar to ensemble mean - but there is appreciable variation with respect to the location of maximum ice thickness". Either the patterns are broadly similar (which includes their key characteristics), or they are not (as given by the location of maximum thickness). Or: "The thickest ice in COSMOS [...] is located in approximately the same region as in the ensemble mean." followed by "In COSMOS, the thickest ice is concentrated into a smaller area." I found this entire section very cumbersome to read.
l.143: What is "relatively" reduced ice?
l.144ff: Why should multi-year pre-industrial ice-thickness patterns match two months of observational record from 2009?
l.152: Another example for very vague language: "The ensemble mean thickness patterns appear to broadly match the observations."
l.187: I did not understand the logic (and meaning) behind: "The finding that sea-ice extent amplitude in the mid-pliocene is 64 % greater than the pre-industrial simulation amplitude holds for the ensemble mean at a lower amplitude extent amplitude."
l.210: Another example for vague and somewhat contradictive language: "A similar finding to the fact that MIROC has similar patterns in winter in both simulations holds for COSMOS, where the central Arctic sea ice thins by a greater amount in comparison to sea ice in other regions."
section 4.1: There is no assessment of pre-industrial simulations in this section, hence the title is misleading. Instead, this section primarily summarizes results from other studies on the historcial simulations from CMIP5.
l.289: Another example for very vague language: "The fact that historical extent simulated by MRI is almost 25 % greater than observations may suggest that its Arctic sea-ice cover is too extensive."
l.292: Why is it a contradiction that a model has a sea-ice extent closest to observations "although" it has the lowest sea-ice extent amplitude?
section 4.2: Again, this section does not really give an assessment of mid-Pliocene simulations, but instead comes to the conclusion that such assessment is not possible.
l.324: Unnecessary repetition, I find.
l.335: This is not very clearly spelled out: Why may a reasonable performance of a model relative to mid-Pliocene sea ice improve confidence into this model, while a the same time a match to present-day observations does not necessarily mean that the model is good?
l.359: Why does HadCM3 only appear to be in closest agreement with proxy-data indications? Either it is, or it isn't.
Section 4.3: Many of these results are known from earlier studies. This should be spelled out here, to allow the reader to see what really is new here.
l.365: Why is it a contradiction that CCSM and NorESM use the same sea-ice component "although" NorESM has a coarser atmosphere and a different ocean?
l.406: In section 4.1, there is no analysis of pre-industrial or mid-Pliocene performance, which would require some comparison against data to actually assess performance.
section 4.3.3: Much of this section remains unnecessarily vague. All data to support or reject the suggestions is in the data that the authors have available, so I find that the analysis should move beyond quoting existing studies by Hill et al., Zhang et al., etc.
l.424ff: I did not fully understand what is meant by "stronger correlation": A higher slope of the linear fit, or less spread around the fit?
l.473: I found this confusing: Models with lower sea-ice albedo have less ice-albedo feedback. Why would they have greater potential to amplify warming from greenhouse gas emissivity?
l.505: What is a "relatively consistent level of variabililty"?
l.518: Again, the data is there to examine this, rather than having to say that "If models see an enhanced ice-albedo feedback, than this is likely to affect those models predictions of future Arctic sea-ice change".
l.521: Why does the fact that HadCM3 produces the thinnest pre-industrial sea ice imply that this model generally has difficulty in simulating observed sea-ice thickness?
l.530: see l.359