Reply on RC2

General Comments It is striking that the step size control method (Algorithm 1) does not actually do much continuous adjustment of the time step most of the time. While the authors go to great lengths to enable the time step to change responsively, in reality the step size quickly expands out to 32x the original time step, or it stays anchored at 1x the original time step. The adjustment happens very early in the simulations, meaning that for most of the 10,000 years the algorithm simply creates more overhead for the simulation without any additional benefit. This suggests that the whole process of finding the right time step could be restricted to an appropriate initiation period (perhaps 100 years, or whatever minimum envelope is needed), after which the time step is held fixed. *** This was also some kind of surprise to us, maybe it should be stressed a little bit more in our conclusions. And your recommendation is definitely right.


General Comments
It is striking that the step size control method (Algorithm 1) does not actually do much continuous adjustment of the time step most of the time. While the authors go to great lengths to enable the time step to change responsively, in reality the step size quickly expands out to 32x the original time step, or it stays anchored at 1x the original time step. The adjustment happens very early in the simulations, meaning that for most of the 10,000 years the algorithm simply creates more overhead for the simulation without any additional benefit. This suggests that the whole process of finding the right time step could be restricted to an appropriate initiation period (perhaps 100 years, or whatever minimum envelope is needed), after which the time step is held fixed. *** This was also some kind of surprise to us, maybe it should be stressed a little bit more in our conclusions. And your recommendation is definitely right.
On a related note, I thought the design of the Algorithm 1 could be significantly improved by only running the error checks on a subset of time steps (not every single time step). Presently, the steps 11 and 12 are computed for every time step of the simulation to verify the accuracy of the chosen delta T. This error check could easily be made on a subset of timesteps. Perhaps on every 10th time step, an error check could be run, or if there needs to be a continuous block of time steps, only do this for a limited window at periodic intervals. It seems wasteful to me to run the error check on every time step, when most of the time it will have no influence. *** In fact this is controlled by the parameter n_s in the algorithm (see Alg. 1, 5th input parameter and used in the for-loop in line 10). We will stress this point even more since it seemed not to be clear enough.
Algorithm 3 appears to have the most utility, since it clearly saves time in the spin-up, and achieves a reasonable approximation to the reference case (correct me if I've got that wrong). Algorithm 2 fails because it appears to me that the exclusion of negative tracers is too stringent a condition. The authors mention that negative tracer concentrations sometimes occur in the reference case… so this algorithm should be a non-starter, shouldn't it? *** We agree and draw the conclusion that this alg. has no effect on time reduction.

However, we just wanted to show what happens if this strict criterion is matched.
Overall, I think there could be more critical evaluation in the discussion and conclusions as to which algorithms actually performed well, which are recommended or not, and why. *** We will put the points that you mentioned (see also below) in the discussion section.
Line Comments L36: "parallelization… lowers the computational effort". Not really, it just speeds up the result. Parallelization results in more computational resources being used not less… the benefit is in human time. *** You are right. We will replace the term computational effort or cost by runtime throughout the manuscript.
L43-45: There are a lot of different time-saving methods listed here, but there is no evaluation of which methods are pertinent to the current study. I think there needs to be a discussion of why one needs an explicit time-stepping method for the present study. The Newton-Krylov method is briefly mentioned, but the authors don't explain why they are not using that method (does it prevent the biogeochemistry models from working properly?) *** We will extend the discussion of this point. Besides the mentioned work of Khatiwala on Newton's method, we have our own experience (ref. Piwonski Slawig 2016a) with Newton's method: It was much more sensitive to the choice of the initial values than the standard spin-up. That is the reason why we concentrated on the spin-up in this paper. It might also be that choosing a different or varying time step in Newton's method will affect its convergence behavior. Moreover, in Newton's method only one year is computed in each iteration, whereas the spin-up is more similar to the solution of an initial value problem, for which adaptive time-stepping was designed originally. The GPU implementation will not be affected by an automatic choice of the step-size, its main improvement comes form the fact that the considered biogeochemical models are water-column models. This structure allows effective acceleration on GPUs, but only for the biogeochemical part, not that much for the ocean transport part.
L56: "ignoring and avoiding negative tracer concentrations". I did not understand this sentence until I had read the whole manuscript. I think this should be rephrased for clarity, to state more simply that a step-size control method was implemented with and without a condition to exclude negative tracer concentrations. *** This is a misleading formulation, we will clarify it already at this point of the manuscript.
L68: "Due to the fully coupling". Grammatically this should be "full coupling" *** will be corrected L80-81: In these equations, the terms A, D ,qi and dn are shown without any explanation (until later in the manuscript). These new terms should be briefly labelled here for clarify. *** This will be added directly below the equations.
L84: You could write here: "advection (A) and diffusion (D)" to partially address the point above. *** We will follow the suggestion.
L84: "in marine water" sounds strange. Why not say "in the ocean"? *** will be changed.
L98-99: "is called marine ecosystem model". Here an article ("a" or "the") is needed in front of "marine" *** will be added. L109: "above equations": please specify which equations you mean *** eqns. (1),(2) were meant, will be added. L116: "refer to Kriest…": I think you mean "refer the reader to Kriest…" *** yes, will be corrected. L151: "for the biogeochemistry tutorial": this is confusing. What is "the biogeochemistry tutorial"? Do you mean this model was created for teaching purposes? *** I think this part of the sentence can be skipped. L251: "Despite of such" -> "Despite such" *** Will be changed.
L245-257: This paragraph could use further discussion on why excluding negative concentrations is justified in the algorithm, given that negative values can occur in the reference case regardless of the accelerated time steps. On balance, it seems to me that this is a poor choice of criterion (the results are not good for Algorithm 2). *** It is definitely a very strict criterion because of the reasons you mentioned.

We included it anyway to show what effect this criterion has if it is applied (it basically destroys the benefit of the step-size choice, as you write).
On the other hand, we wanted to show that violation of the non-negativity in Alg. 1 and 3 had no negative effects.
L284: I don't understand the backslash here. *** This is a math notation for a difference of sets, we will use a clearer notation.
Figure 2: These 6 panels really need titles. It is cumbersome to have to refer backwards and forwards to the caption for the meaning of them. Figure 3: As in Figure 2, the subplots need titles. *** We will include titles. Table 3: I think an extra table, analogous to Table 3, should be added which shows the computational cost saving factor for each model, and the time step multiplication factor m at the end of the simulation. *** Will be added in a revised version. L458: "Only in half of the simulation runs decreased the algorithm": Grammar is wrong. L460: "… applied the entire spin-up large time steps": grammar is wrong. *** will be corrected. L461-463: The sentence starting with "Although the algorithm…" is confusing to read. Please rephrase and clarify. *** Meant was: In some of these cases, the algorithm temporarily decreased the time step. However, this hardly effected the accuracy of the approximation. L466: "an reasonable": typo *** will be corrected.
L492: "local error always needed two evaluations of the same time interval": This highlights my general comment that the algorithm should not be checking the error every single time step. *** In fact this is controlled by the parameter n_s in the algorithm (see Alg. 1, 5th input parameter and used in the for-loop in line 10). We will emphasize this point even more. L493-494: "Due to negative concentrations in the approximations, the algorithm then used nearly always the smallest time step." This suggests to me that this algorithm should not be recommended in the future. *** This definitely is a reasonable recommendation. We will clarify this in the conclusion section, together with the remark that already the reference runs sometimes produce small negative values.
L515-524: Finishing the paper with a list of dot points is not a good way to conclude. Please rewrite this as a normal paragraph, or if you want to list these points like this, don't make it the final statement of the paper. *** We will change this and remove the bullet points.