Reply on RC1

Reviewer: 1 R1.C1 What is missing from the paper is some deeper inductive reasoning that could take the work farther and make it more general (and less about two particular tundra locations). Personally, I found Figure 3 the most interesting result in the paper and found myself wondering why the CV (as a function of the area measured) appeared to be asymptotic to 1. It was not that I doubted the data, but I wondered if that was some physical limitation to CV, or just some limitations in the available data. The authors stated in their discussion that: However, the resolution of SWE products like GlobSnow are much larger (25km); future investigation of CVsd values at those scales have the potential to help GlobSnow 3 (Pulliainen et al., 2020). and I agree with this statement, but suggest we hardly need to wait for future investigations. I would suggest the authors could address this issue more thoughtfully in this paper using the knowledge base they already have. Let’s start (Table below) by examining some extreme depth distributions using Excel. For a completely homogeneous snow depth field, the CV approaches zero. For more realistic heterogeneous snow, and certainly most tundra snow fields, the CV rises with area because (I believe) of snow drifts. For example, in a landscape of mostly very thin snow with with a few very deep drifts I was able to produce values >4 (Case 7). This is exactly the type of situation that exists in tundra snow, particularly in the windier tundra areas (e.g. the Arctic Refuge in Alaska and in the Barrenlands of Canada) where wind scour and drifting is most extreme. I suspect CV values over 2 are often realized, for example the tundra landscape shown below (after the thin snow has melted):But the authors need not just deal with this CV issue in a theoretical framework: they should have access to the TVC lidar maps we produced in 2012. They could readily run a Monte Carlo simulation, varying the location and area examined, then plot the resultant mean depths and CVs thereby adding to the figure. Once that was done, they could move to more general application of CV to the full range of tundra snow. By the way, a quick look at Wikipedia indicates that for small samples, CV is low-biased.

area of sampling using uniform randomly generated radius and location of a circle (mask) using the Lidar dataset. We also aggregated the multiple maps from TVC in 2018 (Walker et al., 2020) to perform the same analysis. Both the mean and CV were evaluated and are shown in the updated figure 3. Multiple addition in the text were done. First, the data section was modified to add the dataset TVC13-Lidar. See modified Table 1 and Table 4 in the revised manuscript for addition of TVC13-Lidar A Lidar dataset of TVC snow depths (93 km2 at 10 m resolution) from April 2013 (Rutter et al., 2019) was also used. Monte Carlo simulations of both the μ_sd and CV_sd were performed on each snow depth map. Simulations randomly selected pixels as the center of a circular mask with a random radius. The mask was used to select all pixels within the circle so the statistical parameters (μ_sd and CV_sd) could be calculated.
This text was added in section 3.1 along with the new figure 3. Previous figure was removed.
[…] the larger lidar derived snow map from TVC in 2013 was used. Figure  We initially thought the CV would increase as spatial coverage increased. Instead, the lack of data points hid high variability in CV for small areas found from the Monte Carlo simulation of both dataset (TVC18-RPAS and TVC13-Lidar). As mentioned by the reviewer, the CV values depends on whether there is enough drift capture in the area sampled. The following was added in the discussion to address Figure 4.
As spatial coverage increased, the CV_sd parameter converged to the full area values (Figure 4). Simulations showed high variation in CV_sd (from 0.1 to 2) for areas < 10 km2. Snow accumulation varied at the meso scale (100 m to 10 km) due to topography and vegetation (Pomeroy et al., 2002) by varying wind-flow direction (Liston and Sturm, 1998). At the meso scale, variability in CV_sd was high due to topographic differences; plateau, slope and valley create favorable conditions from wind flow direction to promote snowdrift, scour and sublimation processes (Parr et al., 2020;Rutter et al., 2019). Vegetation facilitates snow holding capacities by decreasing wind speed near the ground within and downwind of shrub (Marsh et al., 2010;Sturm et al., 2001). Some areas include both extreme drifts and thin snow , resulting in high CV_sd (dark green areas in Figure 3b) which are commonly found in TVC (Walker et al., 2020). CV_sd was lower for areas without drifts (light green areas in Figure 3b). In areas > 10 km2 (Figure 4d), variation in CV_sd is reduced and yielded higher values.
Also, the following paragraph in the discussion was completely removed and modified as follow.
Convergence to higher CV_sd as spatial coverage increased matched the PMW optimized values found in this study using GP simulation (0.8 -1.0). Our analysis in Figure d) showed that CV_sd of TVC13-Lidar converged to 0.6 at 93 km2, but two in situ points from other studies at 625 km2 had higher CV_sd (0.9-1). This indicates that a CV_sd between 0.6-1.0 is desirable to represent snow depth variability in SWE retrievals for PMW SWE products at 25 km for the EASE GRID 2.0 and 625 km2 for GlobSnow 3.0 (Pulliainen et al., 2020) . For active sensors (resolution < 1 km), the high variability in CV_sd under 1 km2 due to high variation in snow depth (Figure b) can affect back scattering since active sensor at Ku band are also sensitive to volume scattering (King et al., 2018). The need for prediction of μ_sd and CV_sd based on topography could become essential at these scales not only for microwave remote sensing but also snow modelling or land data assimilation (Kim et al., 2021).
The other aspect of the paper that bears some thought, and is related to the above point, is how wind slab and depth hoar fractions must interact.
Step 1 in approaching this would be to explain in greater detail how those types of snow were identified in the snow pits in this study. I was struck by the relatively close density values reported in the study for depth hoar and wind slab (means 266 vs. 335 kg/m 3 ). The former value is typical for mildly indurated tundra depth hoar, but the latter is quite low for tundra wind slab, which can exhibit values over to 550 kg/m 3 . Wind slabs of 300 kg m 3 are often soft and hardly wind-worked at all, and in addition, many less experienced field practitioners fail to note small and newly faceted grains in wind slab of this nature. Then there is the problem of "indurated depth" hoar (Sturm et al., 2008; Derksen et al., 2009; Domine et al. 2018), snow layers that were wind slab but have metamorphosed into depth hoar. Presumably the critical aspect of differentiating these textures for microwave remote sensing is that the ornate, hollow and plate-like depth hoar grains scatter microwaves far better than the wind slab, hence subdividing the pack into those two fractions is critical. The relatively similar values of SSA (Figure 4a) for slab and hoar suggest to me the authors were dealing with a of properties rather than a truly distinct bimodal snow pack. I went back to the paper the authors referenced related to a two-component snow model they used: Saberi, N., Kelly, R., Toose, P., Roy, A., Derksen, C., 2017. Modeling the observed microwave emission from shallow multilayer Tundra Snow using DMRT-ML. Remote Sens. 9. https://doi.org/10.3390/rs9121327 and was pleased to see that a long-forgotten paper of mine Sturm, Matthew, Thomas C. Grenfell, and Donald K. Perovich. "Passive microwave measurements of tundra and taiga snow covers in Alaska, USA." Annals of Glaciology 17 (1993): 125-130. had been used in developing that model. That work showed that depth hoar volume scattering was more than 6X effective compared to windslab. It should be possible to go beyond the findings of Rutter et. al. (2019) for TVC, where the DHF was shown to stabilize at 30% for depths over 60 cm, but not why. Figure 2 in this paper shows for both study sites long tails on the distributions out to 150 cm, while the mean depth appears to be 1/3 rd of that value. In a recent paper Parr et al. (2020) defined a drift depth threshold as being approximately the mean plus 1s, so that "extra" depth is statistically likely to be transported snow. A different way to look at Figures 5 and 6 is that for the mean snow depth half the pack is depth hoar; where the pack as been scoured (drift snow removed) that fraction is higher; where the snow is drifted, that fraction is lower. Perhaps the fraction where it is lower would be the mean plus 1s... I am not sure. But some attempt to understand the processes behind the statistics (Bayesian or otherwise) could help generalize the results beyond to very specific tundra locations.
For the first part of the comment, we agree that a more detailed explanation of the DH/WS classification is necessary. We added details on the multiple layers found and how they were classified as slab and hoar. The relatively close peak of each distribution can be explained by the classification of indurated hoar as DH. Also, every layer that did not contain enough large crystals were considered WS which is more a general slab (soft to hard) rather than a wind slab with high density (> 400 kg⋅m^(-3)). The following was added in the result section 3.2.

The goal was to classify DH as large grained snow (large facets, depth hoar cups and chains), then all other snow layers above the DH as wind slab (WS)
. Some layers were more difficult to classify as they contained mixed crystals or were a transitional slab-tohoar layer (also referred to as indurated hoar) (Sturm et al., 2008). Slab that contained small faceted crystals (< 2 mm) were classified as WS. Indurated hoar, a wind slab metamorphosed into depth hoar, was classified into DH with a typical density ~ 300 kg⋅m^ (-3). Because of this reason, the peak of each distribution appeared close to each other in Figure 5 c) and d). For retrieval of snow properties using satellite remote sensing, a 2 layer radiative transfer model using WS and DH can be used to simplify much of the layer complexity found in arctic snowpacks (Rutter et al., 2019;Saberi et al., 2017).
The second part refers to the relationship between DHF and snow depth and how we could go beyond the statistical fit by investigating the process behind the statistic by leveraging Parr et al. (2020). An attempt in understanding the process from your comments was added in section 3.2. Table 4, is an important metric in Figure 6 since above this depth, the variability and the mean DHF is greatly reduced as the snowpack is dominated by wind slab for larger depth. As defined in Parr et al. (2020), the transported snow from wind accumulates at these particular locations (drift) where it was scoured or removed from wind affected area yielding lower depth with high DHF.   Table 5). In c) and d), the best fit distribution is shown in black with the kernel density estimate (KDE) of the histogram of each layer. Figure 5b: For much of tundra snow, tussocks rather than shrubs, are a control on the DHF. Also, since shrubs can be layed down under the snow (and frequently are), a relationship between depth hoar and/or wind slab and NDVI seems tenuous at best.

Parr et al. (2020) found a key threshold of μ_sd+ 1σ_sd to define snow drifts in tundra environments. This threshold of > 0.6 -0.8 m, based on data presented in
Agreed that the relation with NDVI can be tenuous but we still think it can help at regional scale as a measured of vegetation (shrub and tussock). Both vegetation can favor growth of depth hoar with a high DHF (table 2, Sturm et al., 2001) where both vegetation have a high DHF. The point we were trying to make with this figure is that DHF potentially follows vegetation and latitudes at the regional scale. It matches nicely with a recently found results from figure 5 in Royer et al. (2021). The following was added in section 3.2.
However, at the regional scale differences are evident between both regions, where mean NDVI and DHF are greater at TVC (NDVI = 0.5,DHF = 0.54) than CB (NDVI = 0.27,DHF = 0.38). This may add to the latitudinal gradient in Royer et al. (2021) where DHF follows a gradient along a northward transect of arctic sites in Québec and Nunavik. Sites at lower latitudes and with shrubs and tussocks, had higher DHF. Figure 6: My ignorance...but does the Bayesian approach really improve the model much over just using the results of Figure 5?
No it probably doesn't improved simulation other than the relation found in figure 5 (old version). A classical approach could be used as well but our approach provides uncertainties for our simulations from the variability in DHF found at both sites. Also, a Bayesian gaussian process could be implemented in current SWE retrieval framework based on Bayesian framework (Takala et al., 2011). The following was added in the discussion about the method.
The amount of scatterers (snow grain and structure) within the radiometer's footprint is adjusted via the DHF predicted from snow depth (CV_sd). The relationship found in Figure   6 used to predict DHF (Figure 7) could also be used deterministically with the mean function (ϕ_1) or a linear relation of DHF decreasing from 50% to 20%. However, the Bayesian gaussian process was used because SWE retrievals are currently implemented in a Bayesian framework (Takala et al., 2011). The following was added to specify our assumption about the effect of the sub-pixel within the sensor. We have no evidence to support our claim that the sub-pixel combined linearly but it is the assumption that we chose.
To represent the signal measured by the sensor, the mean of the simulated T_B was chosen and it was assumed that the sub-pixels effect combined linearly at this scale in the sensor. Because the simulated TB37V distribution was not exactly a normal distribution, it appeared that the mean T_B of this distribution increased when CV_sd increased ( Figure  8b) Line 335: "... while the mean depth (sd) is dependent on precipitation at a larger scale...". This is categorically NOT true for much tundra snow, wear I would contend that wind plays as strong, and sometimes stronger, role than the mean precipitation within a domain.
This statement was removed from the sentence and now reads as follow. Clearly if a domain fails to include, say drifts, the CV will be too low. Likewise, if the domain is limited to a coupled drift and scour zone it will be too high.
See addition from R1.C1 Also, this part of the conclusion was modified as follow.
Monte Carlo simulations were applied to investigate μ_sd and CV_sd as a function of spatial coverage. An increase in CV_sd matched increased spatial coverage of snow depth sampling, indicating that a higher CV_sd (0.6-0.9) is more suited to estimate snow depth variation in the 3.125 km resolution EASE-Grid 2.0. Also, simulations showed high variation in CV_sd (> 0.9) for areas < 10 km2 indicating a need for topography-based prediction of μ_sd and CV_sd at this scale.