Reply on RC2

we appreciate your constructive comments and remarks on our original submission, which have helped to clarify certain issues and improve several sections in the manuscript. Below, please find our responses to your comments. Regarding the new TEX86 calibration and additional modelled and WOA13 data (see supplementary material), we also restructured the subsections in Sect. 4 Results and discussion.

provide additional responses to the comments and questions of Referee #2 in detail.

RC2.2:
HBIs and other productivity biomarkers -I have no major comments on the interpretation of their data. I have nevertheless one main point that should be clarified further than the authors maybe do. The surface sediment samples selected for this study are probably spanning a "large" period of time than what one would ideally expect when strictly looking at the modern conditions (<40 years). We all know how hard it is to get very recent surface sediments and therefore we can only support this work, despite significant uncertainties surroundings the proxy interpretation as underlined by the authors. As they mentioned, " Vorrath et al. (2019) conducted radiocarbon dating on selected surface sediment samples from the Bransfield Strait, concluding that their biomarker data reflect the past two centuries ", explaining why the authors justify that "the different time periods covered by the different methods need to be considered and kept in mind when interpreting the results ». I presume that Lamping and colleagues have no or very few radiocarbon or 210Pb dating on their selected samples in the different investigated areas. I would strongly suggest to mention that regional sea ice has been probably quite variable over the last two decades and centuries which means that comparing concentrations of the HBIs and other sterol concentrations and ratios with satellite and model data can significantly differ owing to the difference of ages. In addition, I would also suggest the authors to clearly state that many studies have shown that significant degradation of organic compounds occurs both within the water column and surface sediments as a result of microbial activity and this might change from one area to another. It means that variations in concentrations between two sectors might not strictly reflect a real change in production of these compounds in the surface waters but might also report two different degradation states. Two surface sediment samples with two very different ages, about few decades, may exhibit two different concentrations which do not necessary mean that sea-ice concentration was higher or lower between them during a specific period of time but instead that the organic compounds could have been more degraded in one area, especially where the oldest sediment are found. In conclusion, I would insist more on this point in addition to the others convincingly raised by the authors.
Author´s response: We fully agree with the arguments concerning the general problem with core top calibrations presented here by Referee #2 and would like to thank her/him for the well summarized issues about this topic and suggestions on which points to add to the manuscript. We now address the topic on degradation in more detail and also comment on regionally different core top ages (Sect. 5). Regarding the latter aspect, we now refer to e.g., Hillenbrand et al. (2010), Smith et al. (2011) andVorrath et al. (2020) reporting modern Amundsen Sea shelf and Bransfield Strait core top ages, respectively. In addition to the concerns outlined by Referee #2 we now also mention subglacial erosion as well as the input of ancient carbon affecting surface sediment composition and recommend that intensive Pb-dating efforts are an essential prerequisite for core top studies. We now also draw the connection between sedimentation rates and degradation of organic matter (being higher in low sedimentation regimes) and consider studies dealing with HBI degradation (e.g., Rontani et al., 2014;2019a;2019b). We further emphasize that this study is not intended to provide a calibration of PIPSO 25 values against satellite-derived sea ice concentrations -also due to the uncertainties mentioned above.

RC2.3:
I have more concern regarding the GDGT interpretation. There is now an emerging consensus that GDGT are more reflecting along the Antarctic margin the subsurface ocean temperatures (SOT) (0-200m, 100-200m, 50-400m water depth depending on the studied area) rather than SSTs (Kim et al., 2012;Etourneau et al., 2019;Liu et al., 2020). This is mostly linked to the fact that the GDGTs might be more synthesized by Traumarchaeota living at the intersection between the cold and low saline surface waters with the subsurface warm waters (CDW and WDW) around Antarctica. I would first suggest to consider the calibration of Kim et al. 2012 for converting the GDGT ratios into SOT (SOT = 50.8 x TEX86L + 36.1), which may reduce the temperature range and the unrealistic warm values found in the north, and then compare with model and instrumental data at different subsurface water depths. Furthermore, I would also suggest to consider that the GDGT seems to be produced mostly during the late winter and early spring (Murray et al., 1998;Kalanetra et al., 2009), even though we clearly need more data from the water column to confirm such hypotheses. Therefore, mapping the ocean temperatures at different seasons and the most appropriate depths would be worth to try. This might provide further constrains on the use of the GDGTs as paleotemperature proxy.
Author´s response: We now compare our data (incl. the newly calibrated TEX L 86 -derived temperatures) with World Ocean Atlas (WOA) derived temperatures and modelled data for the sea surface and subsurface (410 m; see supplementary material to this comment). Correlations of TEX L 86 SOTs with instrumental and modelled temperatures for different depth intervals (0 -200 m, 100 -200 m, 50 -400 m) suggest that the GDGT signal reflects deep subsurface temperatures (410 m water depth) best. Interestingly, also the RI-OH'-based temperatures show a stronger relation to this depth interval than to the sea surface. We present these new results and provide a thorough discussion where we also address the (dis)similarities between instrumental, proxy-based and modelled temperatures.

RC2.4:
In conclusion, I believe the authors should split their data in two different papers which would make their interpretation clearer and more focused.
Author´s response: As stated above, we would like to keep the manuscript as one, although we understand the reasoning behind this suggestion. Admittedly, based on the newly calibrated TEX L 86 temperatures and consideration of WOA-and model-derived subsurface ocean temperatures we now extend the discussion of the GDGT part, which may lengthen the manuscript. However, this ensures a balance between the HBI and the GDGT part.

Line specific and minor comments and amendments:
Previously line 30: more recent references, especially in the context of the modern global warming and West Antarctica sea ice decline (eg. Wang et al., J. of climate, 2020)?