Reply on RC2

The paper addresses an interesting topic on whether the climate is complex or complicated. After a very nice introduction and focus on the problem, the authors mostly refer to the companion paper (under review on ESD) for answering this question. The manuscript is well-posed in terms of concepts and references, it provides a good review on some outstanding problems in climate science. However, there are some unclear points that should be clarified before the manuscript becomes acceptable for publication. Furthermore, more quantitative results should be considered to support the authors' statements, instead of only considering qualitative discussions.

Applying severe testing (Mayo 2005, Mayo 2010, Mayo 2018 to rival hypotheses of gradual versus step-like change provided a virtual scientific proof that the eventual response to radiative forcing is step-like. We concluded that under radiative forcing, climate acts as a storage and release system. This in itself is evidence of complex change, but an underlying mechanism had not been identified and the results have gained little traction. Jones and Ricketts (2019), Jones and Ricketts (2021) identified the Pacific Ocean heat engine as the governing mechanism of temperature increases from the 1960s. We also identified two modes of climate in the historical era. The first, free mode, lasted from preindustrial temperatures to 1957 in the ocean, 1969 in ocean-land and 1972 on land as determined by regime shifts from red to white noise in autocorrelation in stationary (destepped) NCDC v4 temperature data. This was characterised by both positive and negative temperature shifts originating in the extratropics and being subsequently adjusted to in the tropics. The second is represented by a step-ladder like response to warming originating from around 1970. Many regional climates were stationary to that time.
In moving from free to forced, the ocean and atmosphere became more tightly coupled, with the intensification of links between regions shown by Granger analysis and lagged correlations. This led to an increase in teleconnections, extending effective circulation from the low to higher latitudes and greater links between the hemispheres. The relationship between the cold tongue and warm pool in the Pacific, the cold and warm parts of the heat pump, and the intensification of ENSO all point to a thermodynamic, rather than dynamic, response.
The manuscript did not address the role of thermodynamics beyond stating that thermodynamic forcing was the missing stage in the physical process describing a changing climate. By making the argument that a constructivist approach to understanding complex system behaviour was not feasible, we were implicitly discounting the classical approach to thermodynamics. For example, starting with an idealised Carnot heat engine and modifying this to account for local thermal equilibrium to explain dissipation in a system that is far from equilibrium. Such systems will therefore produce a linear response to forcing as heat is dissipated throughout. The calculation of linear temperature gradients throughout the system and of long-term linear patterns that produce spatial and temporal footprints support this (Ghil and Lucarini 2020, Lucarini et al. 2014, Rennó and Ingersoll 1996. Lucarini et al. (2014) discuss the distinction between free and forced as proposed by Lorenz (1979), suggesting that they are equivalent to the perturbed and unperturbed dynamics of Ruelle (2009). This has led to the two processes being considered as independent (Ghil andLucarini 2020, Lucarini et al. 2014).
However, the presence of steady state regimes undergoing step-like change as a response to forcing is incompatible with this picture. Furthermore, at the time of writing we had not located anything in the thermodynamic literature that supported a regime-like response. Homeostatic change is recognised in biotic systems, but not in abiotic systems. The missing factors are (i) verified processes that might maintain steady state in the coupled system, and (ii) that could lead to regime change. We have since identified literature that addresses energy cascades in rotating and stratified flows that combines large-scale inverse with small scale dissipative flows in a two-way process , Marino et al. 2015, Pouquet and Marino 2013. These have been recognised in a range of environments including climate but have mainly been restricted to transient structures and processes. We hypothesise that they can also maintain long-term processes following the analysis of such processes operating in the western Pacific warm pool (Zedler et al. 2019).
In building from a single paper, the manuscript had two main goals. The first was to understand how well the CMIP5 generation of coupled climate models simulate the performance of the heat engine as compared to observations. To that end, some of the analyses conducted for observations were repeated. This included tracking shifts and the Granger analysis. At the suggestion of a reviewer, model skill was assessed and a strong relationship between skill and basic heat pump performance in terms of regime shifts was detected.
Energy fluxes from two climate models were also tested to see whether they exhibited regime-like behaviour, namely surface sensible and latent heat, and top of the atmosphere short-and longwave radiation. Observations are restricted to the period of satellite record and reanalyses are affected by incomplete observations, especially earlier in the record as we show for top of the atmosphere longwave radiation, so do not provide such information.
A third aspect of climate assessed was a survey of emergence, mostly in models, but the evolution of ENSO in the observed climate was also assessed. The aim of this was to understand how different aspects of complex system behaviour relate to model structure and respond to forcing. Boundary limits in terms of meridional heat transfer were also assessed.

General comments. Changes to manuscript.
The structure of the paper will be revised. There are two options. The first is to retain a single paper and the second to separate it into two papers, the first dealing with the performance of the heat engine and network in climate models and the second a more theoretical paper exploring climate as a self-regulating complex system. This is the less preferred option. A single paper would be redesigned along the following lines. The importance of shifts in steady-state regimes as a response to external forcing, and the need to understand the science giving rise to the complex system behaviour of the Pacific Ocean heat engine and network of teleconnections will follow the complicated/complex framing in the Introduction. We will explore the steady-state regime as a thermodynamic entity in the paper. The inherently complex and complicated/complex models form two sets of models. The former produces a complex system response to forcing and the latter is a combination of linear deterministic and complex behaviour, where the constructionist approach applies to assessing the deterministic component. They both produce similar results over long-term timescales. This structure represents strong emergentist versus constructivist approaches (with some aspects being weakly emergent, as we will discuss). They are both consistent with change in megastates (e.g., snowball earth, cool and warm greenhouse) but not with complex responses on shorter time scales. This also addresses the 'everyone recognises climate as a complex system' point raised by Reviewer 1. The methods section would be amended to introduce different styles of scientific reasoning. These are outlined in the response to reviewer 1, but the main one is the experimental style combined with deductive reasoning, rather than an analyticalhypothetical style more linked to constructivist reasoning (i.e., a hierarchy of simple to complex models all based on the building blocks identified by a reductive process). The paper will briefly describe the "mixed methods" approach with details in the SI. We would explicitly bring thermodynamics into the paper, rather than having it as the elephant in the room. The point about classical approaches using a constructivist approach not being sufficient will be made. The alternative is to take a holistic approach using principles based on the first and second laws of thermodynamics. This would be based on two recent and comprehensive works: Kleidon's (2016) Thermodynamic Foundations of the Earth System and Ghil and Lucarini's (2020) The Physics of Climate Variability and Climate Change. It would place the points already made in within the discussion paper within this framework, showing where they agree and where they differ. The following conceptual model for Earth's climate as a complex heat engine will be introduced. There are two distinct heat engine structures that are separate from each other. Firstly, climate as a heat engine can be viewed from the outside as a 'black box' that increases entropy by absorbing shortwave radiation and emitting longwave radiation. Within this, the earth climate forms a conventional heat engine at the surface, with the warm tropics dissipating to the cool polar regions. However, this knowledge does not show us how energy is dissipated within the system. This is the internal dissipative heat engine. The identification of regimes and a heat-pump/network suggests an unconventional heat structure. That is what this paper explores. If kept as a single paper, the model section would be brought forward and addressed in the following order: Tests carried out for observations repeated using models, such as tracking the order of shifts, magnitude and frequency of shifts in TEP and TWP and Granger testing Testing of heat engine structure in terms of the timing of TWP, TEP and GMST, and relationship between regime characteristics and model skill.
Investigation of regime-like behaviour in surface and top of the atmosphere fluxes to help determine whether the main driving force to equilibrium is top of the atmosphere energy deficit or surface energy surplus. This would be followed by a section on emergence in climate model studies, moving from the emergence of regimes and ENSO with model coupling, aqua planets and topography experiments and patch studies. The more philosophical aspects such as model underdetermination may be moved to the SI. Following Kleidon (2016), all major forms of energy involved in the conversion from incoming shortwave radiation and the dissipative heat engine are considered thermally equivalent. The boundary limits for meridional heat transport in discussed section 4.1 provide a strong thermodynamic limit, which is geostrophically controlled. The efficiency of the atmospheric heat engine is only 2% and meridional transport takes place within the constraints of Coriolis and gravitational forces. Section 4.1 would be revised to account for this. Section 4.2 would be revised to reflect this structure, especially the thermodynamic aspect. The literature supporting self-organisation of the climate system is expanding at a rapid rate but this in principle suits both model sets (complicated/complex and fully complex). Self-organisation in complex systems has to obey the first and second laws of thermodynamics. We would list the aspects contributing to this, which include the conservation of angular momentum (but only at the global scale), consistent with the Lorenz cycle. The difference between the two models sets lies in the presence of steadystate regimes. As mentioned above, we have located some literature on rotational and stratified flows that may support this , Marino et al. 2015, Pouquet and Marino 2013). The experimental evidence shows that regimes are emergent in coupled models. The presence of free and forced modes implies different energy states. The principles inherent in statistical physics implies that the interplay between energy and entropy results in a system at equilibrium being able to reach the maximum number of states. In a highly organised system such as climate, being able to shift in either direction will provide a greater number of states than moving in a single direction. The Pacific Ocean heat engine has this capacity. The limited available kinetic energy and closed dissipative system means that all heat lost to space, including that generated by friction, needs to be balanced between the hemispheres. In free mode, this implies the maximisation of entropy via both global and local regimes where local thermal equilibrium is achieved via regimes that can shift in either direction. Under forcing, the ocean whitens first, implying that global closure in its coupling with the atmosphere has been reached. However, this relaxed from 1937-57 as the ocean reddened, leaving the climate in free mode. Forced mode was reached in 1972 on ocean and land. Comprehensive changes at this time are revealed by the Granger analysis and Atlantic Ocean indices implying that the system has become more globally coupled and that dissipation is switching from the ocean to the atmosphere. Even though the climate models do not show the red/white noise partitioning for free and forced climate, the delayed start to the top of the atmosphere energy deficit in the models may mirror that in the real world, where one of the criteria separating free from forced mode is the establishment of a toa energy deficit. This implies a reduction in entropy within the dissipative system and an increase in maximum power being transferred from the ocean to the atmosphere. Energy equivalence will mean that regime shifts involve changes between the different the modes of meridional energy transport.

Specific comments
Comment 1: The authors promise to answer the question on the complex or complicated nature of the climate system. However, I did not find evidence for an answer but only a complex system-based approach to characterize the climate system. I really appreciate the efforts made by the authors in introducing different concepts from complex system science (e.g., self-regulation, self-organization, ...) but I would suggest to be more concise and to cut a bit Section 2. Moreover, I would suggest to reset a bit the main aim of the manuscript, not in terms of answering the question but instead of describing how complex system science can help understanding some features of the climate system.

Comment 1. Authors' response
The aim of this paper is not to show how complex system science can help in the understanding of some features of climate system, but to show how regime shifts identified in Jones and Ricketts (2017) and the Pacific Ocean heat engine and broader climate network are a manifestation of complex system behaviour. This can then be used to argue that Earth's climate system is fundamentally complex rather than being a hybrid complicated/complex system, where climate variability is considered complex and the forced response deterministic. For the latter, the complicated aspect is in disentangling the two.
If the reviewer did not find evidence for an answer to the question complex/complicated, then the link to the previous work has to be strengthened as described above, where steady-state regimes and how they respond to forcing are clearly identified as the emergent response of a complex system.

Comment 1. Changes to manuscript
Section 2 would be rewritten taking account of points 2 to 5 in the changes to manuscript above. The aim would not be to increase the length but to sharpen the content. The subject matter itself is complex. Table 1 may need to be moved into the SI or omitted. Some of the other material generated in support may also be placed in the SI to manage the paper's length. Figure 1: why the authors state that CCC is the only one having a monotonic behavior? In my opinion also the others show a monotonic behavior that could be easily used to overcome the step-like fits performed by the authors. I would suggest to revise the corresponding text as well as also possibly add a monotonic fit on both HadCM3 and ECHAM3 data to compare with the step-like behavior.

Comment 2. Authors' response
This is an important point in the paper, because it shows when a new generation of climate model moved from producing complicated to complex results. Interestingly also, the ECHAM3 model shows stationarity for the first part of the historical period and the HadCM3 model does not, warming from the start of the historical period. The bivariate test is a detection test, not a goodness of fit test. This point was made in Jones and Ricketts (2021) but not here. The illustration of the internal trends is goodness of fit, but the dates of break points are identified by the test. Furthermore, regime shifts in coupled models have been attributed to external forcing, so this needs to be emphasised.

Comment 2. Changes to manuscript
We can add a monotonic fit, but that really is not the main point. We will add the above qualifications and some additional background in the SI but the HadCM3 and ECHAM3 models really do produce regime shifts. Strengthening the introduction and additional support within the SI will shows these have been detected, rather than fitted.
Comment 3: Sections 3.2.2-3.2.3-3.2.4 could be easily grouped together since they are all referring to the companion paper.

Comment 3. Authors' response
These do not refer so much to the companion paper as build on it. Section 3.2.2 discusses the role of ENSO, partly to emphasise how it is affected by model indeterminism, but the Wang et al. (2019) review also shows intensification in El Niño that coincides with regime shifts. 3.2.3 discusses self-organisation but in light of emergence, following on from 3.2.1.

is hydroclimatology 101 and lists a number of reasons as to why the atmosphere cannot warm in situ.
We felt that it was important to provide this list because it shows the radiative-convective model cannot be sustained, yet this model remains central to climate theory. If the oceanfirst argument is accepted, then the formation of steady-state regimes in the ocean removes any mechanism for gradual warming. Linear response theory is a theory of complex system change, but any such theory needs physical support, which it lacks because the ocean surface does not warm gradually.
Much of standard climate theory is supported by simple models of uncoupled climate, atmosphere-only, energy balance models and the like. An important step is missing, shown in Figure 3.2. The reason for this is partly historical, the theory being proposed before complex behaviour became apparent in observations (which require very highquality data of sufficient duration) and emerged in models.

Comment 3. Changes to manuscript
These sections will be revised as indicated in Point 8 above to take on a more logical order. Some supporting material may be moved into the SI to help streamline the presentation within the paper and will be clearly sign-posted in the paper. The main points about the importance of emergence will be strengthened. Comment 4: Section 3.3 is a bit confusing and difficult to read. I would suggest to take care of setting it in a more fluent way.

Comment 4. Authors' response
Agreed. All of these tests need to be introduced and supported better. We have already flagged moving this earlier in the paper if the single paper suggestion is followed.

Comment 4. Changes to manuscript
As per point 7, the model section would be brought forward and addressed in the following order: Tests carried out for observations repeated using models, such as tracking the order of shifts, magnitude and frequency of shifts in TEP and TWP and Granger testing Testing of heat engine structure in terms of the timing of TWP, TEP and GMST, and relationship between regime characteristics and model skill. Investigation of regime-like behaviour in surface and top of the atmosphere fluxes to help determine whether the main driving force to equilibrium is top of the atmosphere energy deficit or surface energy surplus. Figure 3: the authors used a quadratic fit. Is there any justification for this type of fit? Moreover, I would suggest to add the parameters of the fit that would also allow to compare TEP/TWP and CMIP5 results.

Comment 5. Authors' response
The only reason for the quadratic fit is that it was slightly the best fit in all cases, but different statistical models produced similar results. This actually appears to be a limit function, where results with skill >74 appear to be roughly constant, and <74 begin to scatter. The sample size is too small to get any meaningful result from a partial correlation, but this relationship also has a similar response for other variables -low skill has greater scatter.
We don't see why adding the parameter value would help because equating skill, a collection of measures, to the result is not really interpretable. The greater scatter in models with skill <74 will be pointed out.

Comment 5. Changes to manuscript
We would amend the text slightly, to explain the above. In the SI we could also show which skill measures have the greater effect in a table (though the two measures usedoverall skill and energy -have the strongest relationship). Figure 4: I would suggest to perform a significance test for correlations. Indeed, having a certain correlation coefficient (also high) does not necessary mean that the correlation is significant. Moreover, the correlation used here is a linear correlation, based on peak-to-peak comparison between two signals. What about using a nonlinear estimator for correlation as tools based on information theory (mutual information or transfer entropy)? The latter could also enforce the results of the companion paper on Granger causality.

Comment 6. Authors' response
The Granger analyses are very rich, and here we decided to show only one result due to space limitations. A more comprehensive set of results is planned. The outputs shown are f-stat values, which are not properly labelled in the figure caption or in the text, for which we apologise. This relied too much on the companion paper.
These analyses link the influence in temperature between different areas over time. One and two-way interactions can be traced by looking at opposing pairs. Correlations will not show this, but the Granger results do. One-way interactions will show an influence of one on the other and a two-way relationship will show up as higher f-stat values in both pairs. Figure 4 shows the raw data, whereas Figure S3 shows the stationary (de-stepped) results. The p-values are relevant for the latter and can be shown but are relevant only to the null case, that the data is serially independent. The problem arises in that if regimes are serially independent (they are), the lagged regressions will treat any step-like change between regimes as a trend. For that reason, we only compare the f-stat values, and consider them most relevant if they exceed the nominal p<0.01 threshold.
Regarding the comment on using other methods. This is really tricky. We gave it a lot of thought and ended up relying on the Granger analysis, pushing it beyond its design limits. In the companion paper, we also carried out lagged correlations to check that there was no peak-to-peak behaviour, identifying some in the intersection of the PDO and AMO cycles. There are no such correlations in the other data. The ENSO-like signals in the stationary data show transitory events, whereas those in the raw data show the role of ENSO in regime shifts. The mechanism is an underlying change in the mean temperature occurs and the following ENSO supplies the heat. Because the underlying mean has changed, it does not cool back to the previous mean, but maintains the warmer levels.
We have unpacked this for observations using lagged correlations and taken the underlying data and manipulated it to reproduce the f-stat value. Only some ocean regions, including TEP, reproduce the lag-2 peak, seen in the observations in Figure 4. As these shifts have been identified independently, we know they are a genuine nonlinear influence. Therefore, the results are not a linear correlation that is based on peak-to-peak comparison between two signals.
In a way, this qualifies as a network analysis, because each pair is tested independently. We could show the results as nodes and connections, with the f-stat results showing the mutual information between each node. For these results we are only showing three nodes but had nine for analysis. The observations in the companion paper had 29. Showing them as nodes and connections would represent them as a crude form of graph theory.
We therefore believe we have anticipated and managed most of the reviewer's concerns.

Comment 6. Changes to manuscript
The Figure caption and text can be improved, p-values added to the figures in the main paper and SI, and some of the qualifications added to the SI. The take-home message for this analysis in terms of the results can also be improved. Figure 4 is hard to interpret with the results to 2018 in the models being concealed, so we will work on a way to illustrate this on a common scale, before adding the model results to 2100. An important message is that although the models can sometimes capture the overall patterns, they are much less responsive to forcing. In observations, this is contributing to much greater risk for water-related impacts (including fire) than projected by models. Comment 7: Figure 5: what is the reason of using different step-like fits for the surface latent heat flux from 2060?

Comment 7. Authors' response
We don't understand this question, whether it is asking about the methods or the results. The timing of the steps is a matter for the test, the intervening shifts are fitted, but only to portray how trend-or step-like the record is. Because there is only one model shown, this is more for illustration. This particular chart shows that trends are positive after 2060.

Comment 7. Changes to manuscript
We have flagged the need to focus on the thermodynamic aspects of regimes, so the surface and top of the atmosphere (toa) fluxes will be included to illustrate that regimes affect the whole process of dissipation. The point about surface heat imbalance driving the overall process rather than top of the atmosphere deficit will be shown. The other really important point is that toa short-and long-wave radiation are compensating for each other to maintain steady-state regimes. We will present more model output to illustrate this -it will be mentioned in the text but the additional results will be incorporated into the SI. Comment 8: Section 4: in my opinion this is the core of the paper but it is not really clear and exhaustive. I found that it is surely well-posed in the context but it lacks of new results and it is only speculative. Why the authors did not perform any kind of analyses or approaches based on complex system science to investigate self-organization, scaleinvariance, teleconnections? I would suggest to carefully take care of this part of the manuscript by reducing the huge reference and description to previous works and by including some results based on complex system tools.

Comment 8. Authors' response
we accept that this has not been made clear enough in this paper and relies too much on the previous work, without clearly restating what is at play. However, we find the comment that this section lacks new results somewhat puzzling, when it is based on the identification of regime shifts, their attribution to forcing and the involvement of the Pacific Ocean heat engine and broader climate network in the dissipative process. The reviewer also refers to fitting steps in two comments, whereas in the introduction, regime shifts are attributed to external forcing (lines 38 and 51). We have already said we will add more detail on the previous research (mainly in the SI).
It is not enough to show that climate is complex for two reasons. The first is that climate is often referred to as complex as a synonym for complicated (e.g., the IPCC Working Group I AR5 and AR6 reports). The second is that climate variability is considered by many to be complex, while the deterministic aspect of climate change is complicated. The climate research community would therefore claim they already accept climate as a complex system, even if that complexity is often treated as noise. Showing that the forced response is due to complex system behaviour opens up an agenda that has barely been addressed. Understanding what is behind these emergent features, self-organisation and teleconnections is the goal of the paper.
We have flagged in previous comments that the model structures representing complex/complicated and fully complex need to be delineated and that a thermodynamic overview be provided. The large array of references is in support of our thesis and we will make sure that how they support our findings is made clear.

Comment 8. Changes to manuscript
This section will be comprehensively rewritten to combine the past results that do contain evidence for complex behaviour in observations, the experimental results from a range of methods and addressing relevant thermodynamic aspects of heat engines in complex systems (mainly general principles and limitations of current understanding).
It will be complex because it is an unfamiliar topic, having not been seriously addressed in the literature and it will be speculative, because there is no precedent for self-regulating steady-state regimes in complex thermodynamic environments. However, that is what all the evidence points to.
If Earth's climate is recognised as self-regulating and homeostatic system, the relationships between thermodynamics, statistical physics and information theory in complex energetic systems will have to be redefined. We would not aim to demonstrate this in theoretical terms (we do not have the capacity to do that), but would like to encourage others work in that direction, either by find a solution or developing an alternative explanation grounded in theory.
Comment 9: I would suggest the authors to take care of correct referencing of figures and references as well as to the style of the text (fonts and missing words). An overall revision of the manuscript in terms of style and form would be a benefit.

Comment 9. Authors' response
Thank you. Apologies for the mis-numbering of figures, due to the late inclusion of a Figure and we thought they had been updated. The missing references were due to errant apostrophes in a reference manager. This has been fixed.

Comment 9. Changes to manuscript
These changes would be made and the whole paper revised to be more direct, focusing on style and form, and clearly communicating Section 4, so that how the preceding evidence and arguments contribute.