Comment on cp-2020-158

My biggest concern is with the interpretation of the record. In many places there are large offsets between the WEP data and other published data. I do not suggest that this invalidates the data themselves, but I think a more in depth explanation is often required. The reason given is commonly air-sea offsets, but this cannot be relied upon for >150ppm changes in the mid Pliocene when all the sites are supposedly ‘in equilibrium’. Many of the sites used in the previous compilation of Sosdian et al 2018 are close by (particularly ODP 872, which agrees in absolute terms with ODP 871 in the Indian Ocean). The comparison to ODP site 999 for the Plio-Pleistocene needs some more careful thought, as I do not follow the discussion on changes to air-sea equilibrium there. Diagenesis has the potential to affect these records and this is not discussed at all, nor is the reader pointed to the supplement or data comparisons to compare the raw data, calculations, and calibrations used to gather the final data.

I find that the paper needs some significant reworking, as the message is not clear, it is a big topic to cover and many of the appropriate references are in, but not necessarily called out at the appropriate place.
My biggest concern is with the interpretation of the record. In many places there are large offsets between the WEP data and other published data. I do not suggest that this invalidates the data themselves, but I think a more in depth explanation is often required. The reason given is commonly air-sea offsets, but this cannot be relied upon for >150ppm changes in the mid Pliocene when all the sites are supposedly 'in equilibrium'. Many of the sites used in the previous compilation of Sosdian et al 2018 are close by (particularly ODP 872, which agrees in absolute terms with ODP 871 in the Indian Ocean). The comparison to ODP site 999 for the Plio-Pleistocene needs some more careful thought, as I do not follow the discussion on changes to air-sea equilibrium there. Diagenesis has the potential to affect these records and this is not discussed at all, nor is the reader pointed to the supplement or data comparisons to compare the raw data, calculations, and calibrations used to gather the final data.
Given the nature of this paper as a low resolution Neogene time-series, I think further validation and comparison are required.
Please note for the below: MB15, S18, C17, D18, H09, B11, G14, DlV20 refer to the references MartinezBoti et al 2015, Sosdian et al 2018, Chalk et al 2017, Dyez et al. 2018, Hoenisch et al 2009, Bartoli et al. 2011, Greenop et al., 2014  Line 26: What is the evidence that these sites are in equilibrium today and for the interval studied. To my knowledge the WEP is not considered to be a stable oceanic environment over time.
Line 29: 'reproduce the ice core record' is very strong language for the comparison which has been carried out. There are only 16 points and no comparative data is produced (e.g. numerical data or crossplotted data).
Line 31: The Miocene data is higher than other published data, but so is the Pliocene, and arguably the latter is much more important as more data is available to facilitate the comparison.
Line 33: A 270 ppm transition during the Pliocene iNHG would be very interesting information. This is a huge change (effectively one halving of CO2). Please discuss this more in the context of the other records if you find your Pliocene data to be valid.
Line 68: There is no atmospheric CO2 data from the Pliocene available in the blue ice cores, although they did confirm the presence of ice which is of that age. Please correct this.
Line 72: Foster 2008 is not a B/Ca to CO2 paper, and most recent studies have stopped plotting the B/Ca datasets as there were found to be too many divergent controls on the incorporation.
Lines 94-102: Use Hain et al. 2018 for the strongest case that δ 11 B can be a viable CO2 proxy, regardless of other uncertainties.
Lines 103: Here you refer to various δ 11 B studies as 'high resolution' and yet earlier refer to the ice cores as 'relatively high resolution', I would suggest being more consistent within the manuscript regarding what is 'high resolution' given the timescales you are talking about. The ice cores, Martinez-Boti, Dyez, Chalk, de la Vega and Greenop studies are high resolution compared with your work here, but Foster, Hoenisch, Sosdian are more similar to your new records. Consistency with this will stop some of the false equivalency made about resolution in this paragraph.
Line 126: These studies are problematic in their interpretation of CO2 data and call into question the assurance that these sites have remained in equilibrium. This point is repeated on line 152 but without an explanation to the reasoning behind. Given that this is the key assumption in this manuscript I think it deserves more attention.
Line 155: How much could disequilibrium impact these data? No reference is made to preservation or potential diagenesis changes that may impact the data to a far larger extent.
Line 160-166: The age models are not great for these cores, but I do not think that matters given the resolution of the data. It may be worth stating here that no direct comparison of ages between the cores is made.
Line 194: Gutjahr et al 2020 is the updated reference for this.
Line 197: This is great, where is the data though? Can you add a supplementary table?
Line 217: Why is the 2SD for the δ 11 B NEP so much larger than for the other data despite the increase in n? Line 314: The S18 study is a reinterpretation of the same data from the other two, so this sentence either needs to explain that or just use the most recent iteration.
Line 321: This section is not a fair representation of the existing data. The δ 11 B presented here is ~13.8 ‰ and the minimum in other records e.g. S18 is ~15 ‰, in addition, the data of S18 from ODP 872 is geographically very close to ODP 806. ODP 871 from G14 appears to match the data in S18 quite well, which would imply that another reason is responsible for the difference seen here between 872 and 806 (ocean frontier, preservation or analytical). In addition, G14's raw values (for T.trilobus) are fairly similar to the data presented here. This would then suggest the reason for the difference is in interpretation and calculation. I think plotting the available data in raw δ 11 B and either recalculating or plotting calculated CO2 would really help with this point. There is not a huge amount of data for this period and it is worth discussing fully where it does and does fit.
Line 327: I'm not an expert on this topic but I think both of these ideas have been updated in Stoll et al 2019 and Tanner at al. 2020.
Line 328: these very warm temperatures also appear in the Atlantic TEX86 study of Super et al. 2020 which you could cite here.
Line 363: Please given the 'marginally consistent' value here as well, rather than just the inconsistent. Also see S18 for reconciliation between the datasets of MB15 and B11.
Line 371-373 : please expand on the good agreement here as above. Stap et al. 2016 does not agree particularly well with the other studies.
Lines 376-384: This section reads like it should be in the introduction rather than results.
Line 385: 150 ppm is a lot of disagreement, and as stated it is down to the raw δ 11 B values. I do not follow the vague argument about disequilibrium that has been made several times now.
As in theory, increased upwelling, increased respired carbon dissolved in surface water, reduce pH and increase CO2 estimate and vice versa.
One issue with 999 is the Panama Isthmus and potential influx of surface EEP water. Upwelling in EEP makes water more acidic therefore, if there were to be an increased influx into the Caribbean, this would reduce pH and increase CO2 estimates. For the inverse to happen, 999 would need to be a sink for CO2 so barring a huge change (e.g. reversal of AMOC), it is more likely that CO2 estimates at 999 will overestimate rather than underestimate CO2. With this in mind the difference between 806/7 and 999 would then require even more disequilibrium in the WEP.
I would favour that the Pliocene section here is more likely to have been impacted by diagenesis, or by the closure of the Indonesian through-flow, which is not discussed here.
Line 389: I am not sure you have the resolution to say this, would suggest removing.
Line 401-405: Please reference this section. I would also incite the logarithmic nature of CO2 forcing here, see multiple papers e.g. MB15, C17, DlV20.
Line 415: All of these papers DO suggest a decline over the MPT, it is the main finding of all of them.
Line 425: When is the end of the MPT, please define. Line 472: exchange 'many others' for an e.g. prior to the reference list, or give a few more.
Lines 497-500: This is precisely the point of H09, C17 and D18, I do not think that the addition of one data point allows the confirmation of these claims. DlV20 study missing from Table 2, represents the key reference for the mid-Pliocene period. The ordering of this table is also confusingly out of temporal order. Figure 1: Watch the formatting on the scale axis for the map figure. Please add contours or change the colour scale to something more friendly for colourblindness. The x-axes are also cut off the other panels. Figure 2: Please add a legend to the plot to define the symbols and shades. It would be helpful to colourise your data and add raw data from other studies in grey behind to facilitate easier comparison. A d11Bborate plot would also be useful to account for the different calibrations between species. Figure 7: S18 data is missing from this plot. Figure 9: Much is made of the low CO2 found in MIS 30, but this figure shows lower(?) CO2 at 1-1.1 Ma which is not discussed. Figure 10: Again S18 is missing from this figure, despite it being the most complete Neogene study to date (including this one!)