Reconstructing the Evolution of Ice Sheets, Sea Level and Atmospheric CO 2 During the Past 3.6 Million Years

. Understanding the evolution of, and the interactions between, ice sheets and the global climate over geological time scales is important for being able to project their future evolution. However, direct observational evidence of past CO 2 concentrations, and the implied radiative forcing, only exists for the past 800,000 years. Records of benthic d 18 O date back 10 millions of years, but contain signals from both land ice volume and ocean temperature. In recent years, inverse forward modelling has been developed as a method to disentangle these two signals, resulting in mutually consistent reconstructions of ice volume, temperature and CO 2 . We use this approach to force a hybrid ice-sheet – climate model with a benthic d 18 O stack, reconstructing the evolution of the ice sheets, global mean sea level and atmospheric CO 2 during the late Pliocene and the Pleistocene, from 3.6 million years (Myr) ago to the present day. During the warmer-than-present climates of the Late 15 Pliocene, reconstructed CO 2 varies widely, from 320 – 440 ppmv for warm periods, to 235 – 250 ppmv for the early glacial excursion ~3.3 million years ago. Sea level is relatively stable during this period, with maxima of 6 – 14 m, and minima of 12 – 26 m during glacial episodes. Both CO 2 and sea level are within the wide ranges of values covered by available proxy data for this period. Our results for the Pleistocene agree well with the ice-core CO 2 record, as well as with different available sea-level proxy data. During the early Pleistocene, 2.6 – 1.2 Myr ago, we simulate 40 kyr glacial cycles, with interglacial CO 2 20 decreasing from 280 – 300 ppmv at the beginning of the Pleistocene, to 250 – 280 ppmv just before the Mid-Pleistocene Transition (MPT). Peak glacial CO 2 decreases from 220 – 250 ppmv to 205 – 225 ppmv during this period. After the MPT, when the glacial cycles change from 40 kyr to 80/120 kyr cyclicity, the glacial-interglacial contrast increases, with interglacial CO 2 varying between 250 – 320 ppmv, and peak glacial values decreasing to 170 – 210 ppmv. 3-D used a 3-D ice-sheet-shelf model, forced with output from several different GCM simulations by a so-called “matrix method” of 30 model coupling, to simulate the last four glacial cycles. They showed that their results agreed with geomorphological and proxy-based evidence of ice-sheet volume and extent, benthic d 18 O, deep-water temperature, ice-sheet linear interpolation of these snapshots providing a first order approach to the strength of the feedback and the effects of ice-sheet geometry on atmospheric circulation and precipitation.

In general, the temporal response of an ice-sheet model to a change in prescribed climate will depend on the type of model. A simple linear regression between CO2 and sea-level describes an instantaneous response, which would give a scatterplot as in Fig. 8 showing a single line, without any spread; a one-to-one relation. An SIA ice model (such as the one used by van de Wal et al., 2011) will have a response time ranging between hundreds to thousands of years, depending (among other things) on 5 the way the ice flow factor and basal sliding are described. This means that the relation between CO2 and sea level at a specific point in time is no longer one-to-one, which results in a slight spread around the curve in Fig. 8. Many other physical processes will give rise to additional time lags (e.g. englacial thermodynamics, carbon cycle changes related to oceanic cycling of carbon, glacial-isostatic adjustment, etc.), which will increase the spread in the scatterplots.
The aim of the paragraph the reviewer quotes from is to use this fact to try and interpret some of the observed differences 10 between the different models. We agree that this needs some clarification in the manuscript. Fig. 8. P. 31, l.13. I cannot see on this figure "a relatively narrow range of sea level in the 180-250 ppm CO2 range" in 15 Willeit et al. (2019). The significance of this fact is also unclear to me.

P15, L16-22: Added a few sentences explaining the interpretation of the slope and spread of the scatter plots in
This, too, has to do with time lag. In Fig. 8, both de Boer et al. (2014) and our study show one or two "trails" of dots at the left-hand end of the scatter plot, corresponding to some of the really cold Late Pleistocene glacial cycles (though indeed that can't be seen from the graphs, as they don't show the time stamps of individual data points. We attempted using a color scale 20 for that, but that made the graphs too chaotic to be useful). There, modelled ice-sheet retreat lags the sudden increase in d18O (and therefore CO2) immediately after a glacial maximum, such that the dots in the graph move to the right (increasing CO2) before they start moving up (rising sea level). Compared to de Boer 2014 and our study, the left-hand end of the scatter plot for Willeit 2019 is a bit narrower, indicating that their ice sheets start retreating a bit sooner after CO2 starts rising.
However, we agree that the difference is very small, and mentioning it without investigating the cause of this difference is not 25 useful for the manuscript. We will remove this sentence.

P16, L14: removed this sentence.
p. 32, l. 1. It is true that in our previous studies (but not in Willeit at al., 2019) we have shown that the threshold for 30 glacial inception depends on both insolation and CO2 and we have shown in Ganopolski et al. (2016) that the true CO2 threshold for glaciation is ca. 350 ppm because for higher CO2 glacial inception would be possible only with such low summer insolation which cannot be achieved with the realistic orbital parameters. This funding not only explains fig. 8f but is also consistent with the data (fig. 8a). However, during most of Quaternary, the interglacial CO2 level was in the range 240-280 ppm and this is why it is natural that any appreciable sea level drop can only be observed when CO2 was below these values and this is true for all figures in fig. 8 except for Stap et al. (2017). In short, the fact that most of sea level drop during Quaternary occurred for CO2 below 250 ppm does not imply that ice sheets cannot grow under significantly higher CO2 level. 5 We agree that the theory of glacial inceptions (and terminations!) is more complicated than how it is discussed in our manuscript. Indeed, as both Willeit (2019) and the actual observational data indicate the presence of medium-sized ice sheets under higher CO2 concentrations, this implies that the relation between CO2, insolation and SMB in the models by van de Wal et al. (2011), de Boer et al. (2014 and our study are too simplistic in this regard. We will add a sentence to the manuscript to clarify this. 10 P16L16 -P17L3: clarified this.

15
In this context, "downwelling regions" are those areas of the oceans were downwelling occurs, and deep ocean water is formed.
When Bintanja and van de Wal (2008) created the first inverse-modelling set-up, the idea was that the LR04 benthic d18O stack, while claiming to be made up of a "globally distributed" set of records, is predominantly made up of cores from the Atlantic. The contribution from deep-water temperature to the d18O signal would then be specifically from Atlantic deepwater temperature, which would be related mostly to surface temperatures in the North Atlantic (the downwelling area where 20 this deep water is formed). They therefore related the deep-water temperature d18O component directly to surface temperatures in the Northern hemisphere, which was convenient because this also governed the surface mass balance of the North American and Eurasian ice sheets. Since then, the inverse modelling method has been significantly expanded, particularly by using global climate from a GCM instead of a simple globally uniform temperature offset, such that this parameterisation is now probably too simplistic to be justifiable. We will clarify this in the manuscript. 25 P18L28 -P18L34: clarified this.

Rebuttal to the comments by the editor, Ed Brook
We thank the editor for their comments on the manuscript and would hereby like to address the concerns they raised. Comments in italics, below our rebuttal. Page and line numbers refer to the revised manuscript.
Referee 2 has some concerns about the way uncertainties are described and I also puzzled about this. As I understand it you consider a 0.1 per mil uncertainty in the benthic isotope record only, and that is propagated through the analysis. This is described as "conservative". I think a reader may be confused about whether this means that the uncertainties are likely overestimated or underestimated. Could you clarify this? 10 We have adapted the manuscript to address the concerns the reviewer raised about the reported uncertainties (see above).
Also, I think it would be helpful to describe in more detail why you get some large uncertainties in CO2 in warm climates. 15 We have further clarified the paragraph of the Discussion section dedicated to this issue.

P18, L20-25: clarified this.
One other thing, I am wondering if it would be useful for you to cite the ice core mean ocean temperature reconstructions for the LGM (Bereiter et al. 2018, Nature, v. 553). 20 This is certainly an interesting publication. The number they quote for the glacial-interglacial difference in mean ocean temperature of 2.57 +-0.24 K is reasonably close to the number we found in our earlier paper about our modelling approach (~2.1 K in Berends et al. 2019). However, our current study does not look into ocean temperatures (we are planning to revisit that in the near future, with a substantial update to our matrix method which we're working on). Adding a section on ocean 25 temperatures would pose a substantial revision of the current manuscript, which we feel would not affect our conclusions.

Introduction 25
Understanding the response of ice sheets and the global climate as a whole to changes in the concentrations of atmospheric CO2, is important for understanding the future evolution of the climate system. Since large-scale changes in ice sheet geometry typically occur over thousands to tens of thousands of years, sources of information other than direct observational evidence are required. In order to gain more insight in the relation between these components of the Earth system, studying their evolution during the geological past is useful.

6
One particularly rich source of information is presented by ice cores. The chemical and isotopical content of the ice itself can provide valuable information on the state of the global climate, and in particular on the temperature, at the time the ice was formed (Dansgaard, 1964;Jouzel et al., 1997;Alley, 2000;Kindler et al., 2014). Air bubbles, trapped when the snow compresses first into firn and ultimately into ice, contain tiny samples of the atmosphere from the time the air got trapped. The oldest ice core presently available, the EPICA Dome C core, contains ice, and air bubbles and hydrates, dating back 800 kyr 5 (Bereiter et al., 2015). The information obtained from these ice cores has greatly improved our understanding of the dynamics of the glacial cycles of the Late Pleistocene.
Different methods of relating chemical and isotopic properties of ocean sediments to the atmospheric CO2 concentration have been used to create proxy data extending back further in time than the ice core record. Many studies have measured d 11 B of different fossil foraminifera, using the observed relation to seawater pH to calculate atmospheric CO2 concentrations (Hönisch 10 et al., 2009;Seki et al., 2010;Bartoli et al., 2011;Martínez-Botí et al., 2015;Foster and Rae, 2016;Stap et al., 2016;Chalk et al., 2017;Dyez et al., 2018;Sosdian et al., 2018). Another line of evidence has focused on the d 13 C of alkenones in fossil foraminifera (Seki et al., 2010;Badger et al., 2013;Zhang et al., 2013), although the reliability of this proxy for CO2 concentrations lower than ~400 ppmv has recently been called into question (Badger et al., 2019). A different line of work relates the density of stomata on fossil plant leaves to atmospheric CO2 concentration, providing data throughout the Cenozoic 15 (Kürschner et al., 1996;Wagner et al., 2002;Finsinger and Wagner-Cremer, 2009;Beerling and Royer, 2011;Stults et al., 2011;Bai et al., 2015;Hu et al., 2015). This proxy has been the subject of discussion regarding its reliability (Indermühle et al., 1999;Jordan, 2011;Porter et al., 2019), based on its discrepancies with the ice-core record, and presently unresolved problems due to the influence of other effects such as evolution, extinction, changes in local environment, and immigration of species, although recent work has gone some way to resolving these issues (Franks et al., 2014). 20 Information about the global glacial state is furthermore obtained from the d 18 O of fossil benthic foraminifera, which is influenced by both total ice volume and deep-sea temperature. Ocean sediment cores containing foraminiferal shells have been used to create stacks of benthic d 18 O records dating back 65 Myr (Lisiecki and Raymo, 2005;Zachos et al., 2001Zachos et al., , 2008. Fig.   1 shows the LR04 stack of benthic d 18 O records (Lisiecki and Raymo, 2005), together with the EPICA Dome C CO2 record (Bereiter et al., 2015) and the Earth's orbital parameters and Northern hemisphere summer insolation (Laskar et al., 2004), 25 during the last 800 kyr. Since benthic d 18 O contains both an ice-based and a climate-based signal, using it to understand the evolution of either of these two Earth system components is only possible after disentangling the two contributions. Several studies have aimed at 5 performing this separation by deriving either of the two signals from independent other proxies. One approach has been to derive deep-water temperatures from foraminiferal Mg/Ca ratios (Sosdian et al., 2009;Elderfield et al., 2012;Shakun et al., 2015). Global mean sea level is then solved as a closure term. Alternatively, temperature proxies have been obtained from records of oxygen and hydrogen isotope abundances from ice cores from Greenland (Alley, 2000) and Antarctica (Jouzel et al., 2007). Focusing on the signal from ice volume rather than temperature, Rohling et al. (2014) used planktic d 18 O in the 10 Mediterranean Sea as a proxy for sea-level at the Strait of Gibraltar, translating the result into a global mean sea-level record spanning the last 5.5 Myr. Grant et al. (2014) applied the same method to the Red Sea, producing a record of sea-level at the Bab-el-Mandab Strait with a higher accuracy, but going back only 500 kyr.
These studies are generally viewed as "data" studies; observed variables (chemical concentrations, isotope ratios, etc.) are used to derive climate parameters (ice volume, global mean temperature, pCO2, etc.) using relatively simple, time-independent concepts. Discussions about the validity of the results generally focus less on the physical processes described by these simple models, and more on the properties of the data, such as sample contamination and diagenesis, statistical limitations and measurement uncertainty. 5 A different family are the "model" studies, where physics-based models are used to describe the evolution of (components of) the Earth system (atmosphere, ocean, cryosphere, carbon cycle, etc.) through time. Typically, the aim of a model study is to determine if our understanding of a physical process is good enough to explain the observations, or to use that understanding to make predictions of that process in the future. Several recent studies have aimed to reproduce the evolution of global climate and the cryosphere throughout the Pleistocene, using primarily insolation as forcing (Brovkin et al., 2012;Willeit et al., 2015Willeit et al., , 10 2019. By studying the differences between model results and observations from proxy data, such studies can help with interpreting these data, providing insight into the physical processes that govern the relation between observed and derived variables. In turn, the differences between data and model results can help to assess the importance of physical processes that are not included in the model.
Inverse modelling is a hybrid method, using an approach that combines elements from both these families of research. This 15 method aims to derive the evolution of the entire global climate-cryosphere system through time, based on observations of benthic d 18 O (Bintanja et al., 2005;Bintanja and van de Wal., 2008;de Boer et al., 2010de Boer et al., , 2012de Boer et al., , 2013de Boer et al., , 2014de Boer et al., , 2017van de Wal et al., 2011;Stap et al., 2017Stap et al., , 2017Berends et al., 2018Berends et al., , 2019. This is done by using ice-sheet models, with climate forcing components varying in complexity from simple parametrisations to fully coupled GCMs, to reconstruct ice-sheet evolution. Such models can be used to calculate the contributions to the observed benthic d 18 O signal from both ice volume and deep-20 water temperature over time. By using a tool called an "inverse routine", the model can be forced to (almost) exactly reproduce the benthic d 18 O signal, providing a reconstruction of global climate as it should have evolved in order to explain the observations. The advantage of this approach over other proxy-based reconstructions is that the simulated changes in global climate and ice-sheet evolution are mutually consistent, building on the physical equations within the model framework. Earlier studies adopting this approach used simple, one-dimensional ice-sheet models to represent all ice on Earth or on a single 25 hemisphere, and different parameterisations of the relation between ice volume and climate (e.g. de Boer et al., 2010;Stap et al., 2017). More recent studies have used more elaborate 3-D ice-sheet models covering different regions of the Earth, using more comprehensive mass balance parameterisations and representations of the global climate (e.g. Bintanja and van de Wal, 2008;de Boer et al., 2013de Boer et al., , 2017Berends et al., 2018Berends et al., , 2019. The most recent of these studies, by Berends et al. (2019), used a 3-D ice-sheet-shelf model, forced with output from several different GCM simulations by a so-called "matrix method" of 30 model coupling, to simulate the last four glacial cycles. They showed that their results agreed with geomorphological and proxy-based evidence of ice-sheet volume and extent, benthic d 18 O, deep-water temperature, ice-sheet temperature and atmospheric CO2.
The work presented here builds on the work by Berends et al. (2019), extending their results to 3.6 Myr ago to produce a timecontinuous, self-consistent reconstruction of atmospheric CO2, global climate, ice-sheet geometry and global mean sea-level, over the period where ice core records of CO2 are not available. The inverse modelling approach that was adopted to force this model with a benthic d 18 O record is described in Sect. 2.1. The ice-sheet and climate models are described in Sects. 2.2 and 2.3, respectively, while the matrix method used to couple these two models is described in Sect. 2.4. The resulting 5 reconstructions of CO2, ice volume, temperature and sea-level over the past 3.6 Myr are presented in Sect. 3 and discussed in Sect. 4.

Inverse modelling
In order to disentangle the contributions to the benthic d 18 O signal from global land ice volume and deep-water temperature 10 changes, we use the inverse modelling method proposed by Bintanja and van de Wal (2008) and refined by de Boer et al. (2013,2014,2017) and Berends et al. (2019). The two contributions to the benthic d 18 O signal are not independent; both result from changes in the Earth's climate, driven by changes in insolation and CO2. A cooling of the global climate will, over time, affect the temperature in the deep ocean, especially when the cooling occurs at high latitudes, where deep water formation occurs. Simultaneously, such a global cooling leads to an increase in ice volume, which affects the d 18 O of the sea water itself. 15 The inverse modelling method is a tool that determines how modelled CO2 should have evolved over time in order to affect the global climate in such a way that the observed benthic d 18 O record is reproduced. The modelled value for CO2 is used, together with the ice sheets simulated by the ice-sheet model, to determine the global climate by using a matrix method of model forcing. This method was presented by Berends et al. (2018) and refined by Berends et al. (2019), and is described in more detail in Sect. 2.4. The resulting climate is used to determine the surface mass balance over the ice sheets, which forces the 20 ice-sheet model. The resulting global ice volume and deep-water temperature anomaly (which is derived from the high-latitude annual mean surface temperature anomaly, using a moving average of 3,000 yr) are used to calculate a modelled value of benthic d 18 O. This approach is visualized conceptually in Fig. 2. %'( at the model time t. A modelled value that is too low indicates that the modelled climate is too warm, or the ice sheets are too small. pCO2 is then decreased in the next time-step (with an increment proportional to the d 18 O discrepancy). This relationship is quantified by the following equation: 10

Figure 2: A conceptual visualization of the inverse forward modelling approach. The model is forced externally by an insolation reconstruction and a benthic d 18 O record (black boxes). pCO2 is changed over time based on the difference between observed and modelled d 18 O. The climate matrix interpolates between the pre-calculated GCM snapshots, based on the prescribed pCO2 value and the modelled state of the cryosphere (ice thickness and albedo), to determine the climate that is used to calculate the surface
Here, + ....... is the mean modelled pCO2 over the preceding 8.5 kyr, and is a scaling parameter which controls how fast modelled CO2 is allowed to change (serving as a low-pass filter, suppressing overshoot that could result from large changes in the benthic d 18 O record). In order to calculate d !" $%& , the contributions from modelled ice volume and deep-sea temperature 15 are calculated separately. The spatially variable isotope content of the individual ice-sheets is tracked through time, with the surface isotope balance based on the observed present-day relation between precipitation rates and isotope content according to Zwally and Giovinetto (1997). Benthic d 18 O is assumed to be linearly dependent on the global mean deep-water temperature anomaly, which is calculated by temporally smoothing the high-latitude annual mean surface temperature anomaly over the preceding 3 kyr and multiplying with a scaling factor of 0. 25. Berends et al. (2019) showed that this yields a deep-sea 20 temperature anomaly of 2 -2.5 K at LGM, in general agreement with proxy-based reconstructions (Shakun et al., 2015). The values of 8.5 kyr for the length of the CO2 averaging window, 3 kyr for the deep-water temperature averaging window, and 120 ppmv ‰ -1 for the d 18 O-CO2 scaling parameter, were determined experimentally to accurately reproduce the observed benthic d 18 O record.
The combination of this inverse modelling method to reconstruct pCO2 and the matrix method to determine the global climate has been shown to accurately reproduce changes in benthic d 18 O and the individual contributions from both global ice volume 5 and deep-water temperature, as well as ice-sheet volume and extent, ice-sheet surface temperature and pCO2 during the last four glacial cycles (Berends et al., 2019).

Ice-sheet model
The evolution of the ice sheets is simulated using ANICE, a coupled 3-D ice-sheet-shelf model (de Boer et al., 2013;Berends et al., 2018). ANICE uses a combination of the shallow ice approximation (SIA; Morland and Johnson, 1980) for grounded 10 ice and the shallow shelf approximation (SSA; Morland, 1987) for floating ice to solve the ice mechanical equations. Basal sliding is described by a Coulomb sliding law and solved using the SSA, using the hybrid approach by Bueler and Brown (2009), where basal friction is determined by bedrock elevation. Internal ice temperatures, used to calculate ice viscosity, are calculated using a coupled thermodynamical module. The surface mass balance is parameterized based on monthly mean surface temperature and precipitation, where ablation is calculated using the surface temperature-albedo-insolation 15 parameterization, as explained in more detail by Berends et al. (2018). The solution by Laskar et al. (2004) is used to prescribe time-and latitude-dependent insolation at the top of the atmosphere. A combination of the temperature-based formulation by Martin et al. (2011) and the glacial-interglacial parameterization by Pollard & DeConto (2009) The model is run on four separate grids simultaneously, covering North America, Eurasia, Greenland and Antarctica, as shown in Fig. 3. The horizontal resolution is 40 km for Antarctica, North America and Eurasia, and 20 km for Greenland.

Climate model
HadCM3 is a coupled atmosphere-ocean GCM (Gordon et al., 2000;Valdes et al., 2017), which has been shown to accurately 5 reproduce the present-day climate heat budget (Gordon et al., 2000). It has been used for future climate projections in the IPCC AR4 (Solomon et al., 2007), and paleoclimate reconstructions such as PlioMIP (Haywood and Valdes, 2003;Dolan et al., 2011Dolan et al., , 2015Haywood et al., 2013) and PMIP2 (Braconnot et al., 2007). Atmospheric circulation is calculated at a resolution of 2.5 ° latitude by 3.75 ° longitude. The ocean is modelled at a horizontal resolution of 1.25 ° by 1.25 °, with 20 vertical layers. We use results from several different steady-state time slice simulations with HadCM3 of different climate states to 10 force our ice-sheet model, using the matrix method explained in Sect. 2.4.

Matrix method
According to Pollard (2010), a climate matrix is a collection of pre-calculated output data from several steady-state GCM simulations, called "snapshots". These snapshots differ from each other in one or more key parameters, such as orbital configuration, prescribed atmospheric pCO2, or ice-sheet configuration. Each of these constitutes a separate dimension of the 15 matrix. When using an ice-sheet model to simulate the evolution of an ice sheet over time, the prescribed climate is determined in every model time-step by combining the GCM snapshots according to the position of the ice-sheet model state in the climate matrix, which is determined by the modelled values of the parameters describing the snapshots. This approach occupies the middle ground between offline forcing and fully coupled ice-sheet -climate models. The different GCM snapshots contain the key feedback effects of the altitude and albedo on the temperature. In addition, the matrix method captures the effect of 20 ice sheet geometry on large-scale atmospheric circulation and precipitation. The matrix method creates a spatially variable linear interpolation of these snapshots providing a first order approach to the strength of the feedback and the effects of icesheet geometry on atmospheric circulation and precipitation.
In this study, we use the matrix method developed by Berends et al. (2018), where temperature fields from the different climate states are combined based on a prescribed value for pCO2 and on the internally modelled ice-sheets. The feedback of the ice sheets on the climate is calculated via the effect on absorbed insolation through changes in surface albedo. This interpolation is carried out separately for all four ice sheets. The altitude-temperature feedback is parameterized by a constant lapse-rate derived from the GCM snapshots. Precipitation fields are combined based on changes in surface elevation, reflecting the 5 orographic forcing of precipitation and plateau desert effect caused by the presence of a large ice-sheet. The equations describing these calculations are presented in Appendix A, and explained in more detail by Berends et al. (2018), who demonstrated the viability of this method by simulating the evolution of the North American, Eurasian, Greenland and Antarctic ice-sheets throughout the entire last glacial cycle at the same time. They showed that model results agree well with available data in terms of ice-sheet extent, sea-level contribution, ice-sheet surface temperature and contribution to benthic 10 d 18 O. Here, we apply this matrix method to the climate matrix created by Berends et al. (2019), consisting of eleven precalculated GCM snapshots, created with HadCM3. Two of these, produced by Singarayer and Valdes (2010), respectively represent the pre-industrial period (PI) and the last glacial maximum (LGM). The other nine, produced by Dolan et al. (2015), represent the global climate during Marine Isotope Stage (MIS) M2 (3.3 My ago), for four different possible ice-sheet geometries and two different pCO2 concentrations, plus one Pliocene control run. The total set of eleven snapshots allows the 15 climate matrix to disentangle the effects on climate of changes in pCO2 and changes in ice-sheet extent, and provides information on climates that are both colder and warmer than present-day.

Experimental set-up and results
Here, we describe a set of simulations from 3.6 Myr ago to the present day. The choice for this starting point aims to include the end of the Late Pliocene and the inception of the Pleistocene glacial cycles. The model was initialized with the same 20 PRISM3 ice sheets that were used as boundary conditions in several of the HadCM3 simulations by Dolan et al. (2015). The model was forced with the LR04 stack of benthic d 18 O records (Lisiecki and Raymo, 2005). We chose here to perform three after the MPT. The visibly larger uncertainty for higher CO2 concentrations, occurring during the warm periods of the late Pliocene, is likely due to the relative sparsity of the climate matrix for warmer-than-present states (three snapshots only) compared to the colder-than-present part (seven snapshots), which might lead to a bias towards larger ice sheets in warm worlds.  The CO2 reconstruction for the past 800 kyr is compared to the EPICA Dome C ice core record (Bereiter et al., 2015) in Fig.   6, as well as to several different model reconstructions and proxy-based reconstructions. The three other model-based reconstructions shown were created by decoupling the benthic d 18 O signal using a 1-D ice-sheet model (van de Wal et al., 10 2011;Stap et al., 2017), or by using an insolation-forced, fully coupled ice-sheet -climate -carbon cycle model (Willeit et al., 2019). The geological proxies are based either on alkenones (Seki et al., 2010;Badger et al., 2013;Zhang et al., 2013) or d 11 B ratios (Hönisch et al, 2009;Seki et al., 2010;Bartoli et al., 2011;Martínez-Botí et al., 2015;Stap et al., 2016;Chalk et al., 2017;Dyez et al., 2018;Sosdian et al., 2018) derived from benthic foraminifera, or based on stomata (Kürschner et al., 1996;Stults et al., 2011;Bai et al., 2015;Hu et al., 2015). Our results broadly match those of van de Wal et al. (2011), who reconstructed CO2 using 1-D ice models, forced with a simple spatially uniform temperature offset (assuming the strength of all feedbacks in the CO2-temperature relation to be constant in time) calculated from benthic d 18 O using a similar inverse forward modelling approach. Both our results and those of van de 10 Wal et al. (2011) show values during the late Pliocene that were higher than pre-industrial, but lower than present-day (412 ppmv at the date of writing), though the values by van de Wal et al. (2011)  None of the model-based reconstructions agrees better than the others with the available proxy data. A recent study by Badger et al. (2019) showed that the alkenone proxy has only a very low sensitivity to atmospheric CO2 at low to moderate CO2 concentrations, as is clearly visible in the top panel of Fig. 6, where the alkenone proxy data show a nearly constant CO2 5 concentration throughout the last glacial cycle. They state that the reliability of this proxy for values lower than ~350 ppmv (the entire Pleistocene and large parts of the Late Pliocene) is doubtful. In addition, several studies (Indermühle et al., 1999;Jordan, 2011;Porter et al., 2019) have questioned the reliability of the stomatal proxies. However, even when we disregard both of these proxies, the level of disagreement between the different boron isotope-based proxies is such that using them to choose one model-based reconstruction over another is not possible. The only available data record which is reliable enough 10 to allow such a comparison is the ice-core record (Bereiter et al., 2015). In Table 1, we compare the performance of the different model-based studies at reproducing this CO2 record. All reconstructions have been prescribed an "optimised time lag" to find the highest possible correlation with the ice core record. Our results show the best agreement in terms of the coefficient of determination R 2 = 0.74, the root-mean-square-error RMSE = 13.5 ppmv (both taken over the entire 800 kyr ice-core period), and zero time lag required to obtain these values. To put these numbers into perspective, the same numbers are listed for a 15 simple least-squares linear fit between the LR04 benthic d 18 O stack and the EPICA Dome C record. This statistical "model" yields approximately the same results as our reconstruction in terms of correlation and RMSE, although the resulting post-MPT glacial-interglacial amplitude is about 23 ppmv too small, indicating that the increased glacial-interglacial variability is not properly captured. After the MPT, interglacial CO2 varies between 250 -320 ppmv, and glacial values decrease to 170 -210 ppmv, with neither range showing a clear trend. The reduced glacial-interglacial CO2 difference of 48 ppmv (averaged over the entire period) that we find in our results agrees with the value of 43 ppmv observed in the boron isotope data by Chalk et al. (2017). The values for all four model-based reconstructions, both before and after the MPT, are shown in Table 1. After the MPT, we find a glacial-interglacial difference of 87 ppmv, which agrees very well with the value of 91 ppmv over the past 800 kyr from the EPICA Dome C ice core record (Bereiter et al., 2015). 5 The discussion about the nature of the Late Pleistocene glacial cycles is still ongoing. Different studies have proposed that they represent either a strongly amplified response to a weak 100 kyr forcing (Ganopolski and Calov, 2011), a response to the same 40 kyr forcing as in the Early Pleistocene, with an additional threshold mechanism that causes some insolation maxima to be "skipped", leading to 80/120 kyr cyclicity (Huybers, 2011;Tzedakis et al., 2017), or even as the result of some stochastic process with a collapse threshold, containing no true periodicity (Wunsch, 2003). Since the timing of glaciations and 10 deglaciations in our model is forced by the LR04 benthic d 18 O stack, and thus its depth-age model, no meaningful conclusions regarding this discussion can be drawn from our results.

15
The non-linear, time-dependent relation between CO2 and sea level is visualised in Fig. 8, both for the different proxy data (panel A) and the different model reconstructions (panels B-F) using scatter plots. This visualisation contains several useful features; the general shape of the curve shows the sensitivity of the world's ice sheets to changes in climate, while the spread of the data points around the curve indicates the time lag between the two. An instantaneous response would result in zero spread, while a long lag would yield a wide spread. This time lag is caused by the slow response of ice sheets to climate change, 20 the intrinsic hysteresis of global ice sheet volume under a changing climate, and other slow-responding physical processes such as englacial thermodynamics, glacial-isostatic adjustment, and oceanic carbon cycling.  The combination of ice core CO2 data (Bereiter et al., 2015) and Red Sea sea-level data (Grant et al., 2014), regarded as the two most reliable proxy reconstructions for the colder-than-present worlds of the Late Pleistocene and shown by the purple 10 data points in Panel A, displays a pattern that is very similar to the model results by de Boer et al. (2014), and by this study. ppmv. This indicates that the effects of orbital changes on the global climate, and by extension on the surface mass balance and the timing of glacial inceptions, are underestimated in these models.
The much weaker response of sea level to changes in CO2 for concentrations higher than 280 ppmv, which is displayed by all models except Stap et al. (2017), seems to be confirmed by the sea-level data by Miller et al. (2012) and the boron isotope CO2 5 data. The sea-level data by Rohling et al. (2014) show much more variation, especially during the early Pleistocene. The results by van de Wal et al. (2011) show a very clear threshold around 375 ppmv CO2, beyond which sea level rises rapidly with increasing CO2. As indicated by the colours, they only find these warm greenhouse worlds during the first half of the Miocene , which explains why none of the other models, nor the proxy data, display this behaviour. Although some of the boron isotope proxies suggest CO2 concentrations higher than this 375 ppmv threshold value, the sea level data do not 10 show the associated strong rise.

Conclusions and discussion
We have presented a new, time-continuous, self-consistent reconstruction of atmospheric CO2, ice sheet evolution and global mean sea-level of the last 3.6 Myr, based on the benthic d 18 O glacial proxy. This reconstruction was created by using a hybrid ice-sheet -climate model (Berends et al., 2018) to decouple the contributions from ice volume and deep-water temperature to 15 the benthic d 18 O record (Lisiecki and Raymo, 2005). Our sea-level reconstruction agrees well with similar model-based results (de Boer et al., 2014), the reconstruction based on Red Sea d 18 O by Grant et al. (2014) and the combined results of benthic d 18 O (Naish et al., 2009) and geological backstripping (Miller et al., 2012). Our results agree less well with the reconstruction based on Mediterranean Sea d 18 O by Rohling et al. (2014), which goes back further in time but has a lower signal-to-noise ratio, and with the reconstruction based on benthic d 18 O and Mg/Ca ratios by Elderfield et al. (2012). Although previous studies 20 (van de Wal et al., 2011;de Boer et al., 2014;Stap et al., 2017) used more simple, parameterized ice-sheet and climate models, the resulting sea-level reconstructions are very similar.
Our CO2 reconstruction agrees well with the EPICA Dome C ice core record (Bereiter et al., 2015), showing the strongest correlation of all four model-based reconstructions. Our results do not agree well with different chemical proxy-based reconstructions based on foraminiferal alkenones (Seki et al., 2010;Badger et al., 2013;Zhang et al., 2013), foraminiferal d 11 B 25 ratios (Seki et al., 2010;Martínez-Botí et al., 2015;Stap et al., 2016;Chalk et al., 2017;Dyez et al., 2018;Sosdian et al., 2018) or fossil plant stomata (Kürschner et al., 1996;Stults et al., 2011;Bai et al., 201;Hu et al., 2015). However, the strong spread between those different proxies and the large uncertainty of each individual proxy record do not allow to conclude whether the difference between those proxies, and our model reconstructed values is significant. Deleted: Willeit et al. (2019) report a threshold for glaciation that depends on both insolation and CO2. The models used by van de Wal et al. (2011), de Boer et al. (2014, and this study do not explicitly account for the effect of insolation on temperature, but only include an insolation term in the mass balance calculation (and, in this study, 40 in the interpolation in the climate matrix), which might explain this difference during glacial maxima, with both values decreasing by about 20 ppmv over the 1.4 Myr course of the Early Pleistocene. After the Mid-Pleistocene Transition (MPT), when the glacial cycles change from 40 kyr to 80/120 kyr cyclicity, their sea-level amplitude increases to 70 -120 m, with CO2 varying between 250 -320 ppmv during interglacials and 170 -210 ppmv during glacial maxima. This implies a glacial-interglacial contrast of about 45 ppmv pre MPT and 85 ppmv post MPT. The two most reliable proxy records, the sea-level reconstruction by Grant et al. (2014) and the EPICA Dome C ice core record (Bereiter et 5 al., 2015), display a relation between sea-level and CO2 for colder-than-present worlds that matches our model results. For warmer-than-present worlds, the large spread and uncertainty in available proxy data for both sea-level and CO2, as well as the much larger uncertainty in our model results during these warm periods, prevent us from drawing conclusions on the reliability of our methodology in this regard. We believe that the inverse modelling method of using benthic d 18 O, combined with coupled or hybrid ice-sheet -climate models, as a proxy for past CO2 and sea-level, can add valuable insights into the 10 evolution of the Earth system during these warm episodes.
During the warm late Pliocene, we find a large variability in CO2, with a difference of about 100 ppmv between the cold glacial excursion 3.3 Myr ago, and the warm peak around 3.2 Myr ago. The boron isotope-based data by Martínez-Botí et al. (2015), which has the highest temporal resolution and longest temporal range of all proxy records, shows a variability of about 150 15 ppmv during this period (albeit with an uncertainty of about 100 ppmv in either direction). While the large discrepancies between different proxy records are too large to draw any definitive conclusions, their findings do seem to support strong CO2 variability during these warm periods. Our results for this period also show a much larger uncertainty range than during the Pleistocene. We suspect that the relative sparsity of our current matrix in this warm regime biases the model towards large ice sheets in warm worlds. The high variability in our reconstructed CO2 during the late Pliocene can then be explained as a 20 consequence of this bias, and the requirement posed by the inverse modelling approach that the prescribed d 18 O record is reproduced. If a high (warm, little ice) d 18 O is prescribed, but little or no ice-sheet retreat occurs in the model, then this must be compensated for by an additional increase in deep-water temperature, which requires large changes in surface climate, and therefore in CO2. By adding additional GCM snapshots for worlds with > 400 ppmv CO2 and smaller-than-present ice sheets (particularly a reduced East Antarctic ice sheet), modelled ice-sheet retreat in warm climates might be increased, reducing 25 simulated CO2 variability during these periods. While this means that the modelled sea-level rise during this period will be larger, our current results are at the low end of the uncertainties of other reconstructions, implying that a higher modelled value will not immediately lead to a mismatch.
downwelling. This constitutes a strong simplification of the complex relation between atmospheric and oceanic temperatures, and one that is no longer necessary in our model, where the globally uniform temperature offset used by Bintanja and van de Wal (2008) has been replaced with the GCM snapshots used in our matrix method. Based on output from available oceanenabled GCM simulations of the LGM, a more elaborate parameterisation could be constructed, perhaps using surface temperatures over specific downwelling regions. Instead of using a global benthic d 18 O stack, it might even possible to separate 5 the records from different ocean basins.
Although the margins of error we report for our CO2 and global mean sea-level reconstructions are relatively small, these describe only the uncertainty arising from the uncertainty in the prescribed d 18 O forcing. The true uncertainty in our reconstructions depends on many other factors, including the many parameterisations included in our model (such as the matrix 10 method, and the relation between atmospheric and oceanic temperatures, and benthic d 18 O), and is definitely large than the margins of error reported here. However, the uncertainties arising from these factors an unfortunately not be evaluated at this stage.
Our results should not be interpreted as a realistic reconstruction of what the world looked like in terms of global climate, ice 15 sheet geometry, sea level and CO2, during these periods of geological history. Rather, we believe they should be viewed as scenarios, which can help to interpret an expected new ice core record. For example, one hypothesised mechanism behind the MPT is regolith erosion, which led to a change in basal conditions from mostly sliding ice to mostly non-sliding ice over North America and Eurasia during the MPT (Clark and Pollard, 1998;Willeit et al., 2019. Basal conditions in our model are constant over time, so that the MPT is entirely attributed to changes in CO2. A simulation identical to ours, but including a change in 20 basal conditions, would result in a different pre-MPT CO2 history. Comparing the new ice core record to these two different scenarios could help determine if regolith erosion was a crucial factor in the transition. Other proposed physical mechanisms, such as regolith erosion leading to a different relation between ice sheet geometry and glaciogenic dust, changes in ocean circulation leading to a different dynamic equilibrium between atmospheric and oceanic carbon, or a slow background decrease in CO2 caused by weathering or reduced volcanism, could be explored in a similar manner. We believe our modelling approach 25 is very suited for this kind of exploratory analysis.
Data availability. The reconstructed global mean sea-level and atmospheric CO2 concentration over the past 3.6 Myr are available online at doi.org/10.5281/zenodo.3793592

Appendix A 30
Here, we briefly summarise the equations governing the matrix method. A more thorough explanation of this approach can be found in the original publication by Berends et al. (2018).
Deleted: That contribution is currently derived from the modelled high-latitude annual mean surface temperature anomaly, using a simple 3,000 yr moving average and a constant scaling factor.

35
Each GCM snapshot contains data fields describing global monthly mean 2-m air temperature, global monthly mean precipitation, and surface elevation. The interpolation routine for the temperature field uses modelled CO2 and modelled "absorbed insolation" as weighting parameters to interpolate between snapshots. This approach captures the two most important physical processes through which a change in ice-sheet geometry affects the local and global temperature; the ice-5 albedo feedback, and the altitude-temperature feedback. Since the absorbed insolation changes not only through changes in albedo, but also through changes in incoming insolation, the effects of orbital forcing are also (indirectly) accounted for. The weighting factor wCO2 is calculated as: .

(A3)
This weighting field is partly smoothed with a 200 km Gaussian filter F, to account for both local and regional effects: Orography is interpolated as well, so that it can be used to downscale the temperature field from the low-resolution GCM grid to the higher-resolution ice-model grid, using a simple lapse-rate approach: ( , ) = SLT ( , ) + λ 89: ( , ) Wℎ( , ) − ℎ SLT ( , )X, The spatially variable lapse rate λ 89: ( , ) is derived from the temperature field of the GCM snapshot.
The interpolation for precipitation is based on ice thickness rather than absorbed albedo. This reflects the fact that precipitation over the ice sheet is strongly affected by altitude: Finally, the precipitation field is downscaled from the low-resolution GCM grid to the higher-resolution ice-model grid, using the precipitation model by Roe and Lindzen (2001) and Roe (2002), to correct for the steeper slopes in the ice model: 15 Author contributions. CJB, BdB, and RSWvdW designed the study. CJB created the model set-up and carried out the simulations, with support from RSWvdW. CJB drafted the paper, and all authors contributed to the final version.
Competing interests. The authors declare that they have no conflict of interest.