Review of revision 1 of cp-2023-69 : “Can machine-learning algorithms improve upon classical
palaeoenvironmental reconstruction models?
Peng Sun, Philip. B. Holden, and H. John B. Birks”
General:
The revision as mostly additions compared to the original one. This paper is in fact four papers in one: (1) a comparison of machine learning (decision tree based) methods and weighted averaging methods for paleo reconstruction (2) the use of multiple ensemble methods to lever the performance of individual methods (3) the idea of using embedding by the GloVe method, that is, re-expressing the taxon and co-occurrence data (in a possibly lower number of dimension) and using the re-expressed data either alone or together with the original data as predictors in the machine learning methods (4) showing the good cross-validatory performance in terms of RMSEP does not guarantee a reliable reconstruction. To identify un- or less- reliable reconstructions the authors recommend the Telford/Birks significance test. This start of my review is a rephrase of the first one in my review of the first version, which started: “The purpose of the paper is unclear to me; it should perhaps be more focussed.”. The authors did not make the paper more focussed.
I have few or no comments on the paper regarding points (1) and (4). My reservation in the original submission on point (2) has been resolved (I understand the authors claim in the rebuttal by now, see below). I see many issues with point (3) which is perhaps the most novel to paleo-ecology but also the least important.
GloVe is an unconstrained ordination (dimension reduction) method applied to the pairwise taxon co-occurrence table in the training set. The main text says that the GloVe scores (in MEMLMc) are appended to the taxon abundance values (and used directly in MEMLMe). On this second/third reading I missed how scores for training samples are derived. This key information is kind of ‘hidden’ in line 170 which has the issue that it uses the term assemblage data in at least two ways: (1) co-occurrence matrix (2) training samples containing taxon percentages. Clarify.
As an aside and simple analogy: a principal components analysis can be carried out on the covariance matrix and a non-centred one on the inner-product matrix, which is very close to a co-occurrence matrix when applied to presence/absence data [similar things apply to correspondence analysis, which presumably comes even closer a co-occurrence matrix]. This is known as R-mode PCA. From R-mode PCA, the usual sample scores can be derived by taking a linear combination (section 5.3.6 of (Jongman, ter Braak & van Tongeren 1995). From this analogy it can be conjectured that analysing co-occurrence gives very little (probably, nothing) extra compared to analysing the abundance matrix itself. A way to find out is described by (van der Voet 1994). It would be nice (but not a prerequisite), in my view, to add such analysis to the MS.
From a theoretical point of view do not think an ordination analysis of co-occurrences can really improve paleo-reconstruction or significantly lower RMSEP. The reason is that decision-tree based methods combine the predictors themselves. Such combinations are interactions in terms of classical statistical models and have co-occurrence as special case.
It is unclear to me from the text how the co-occurrence matrix has been calculated as each sample contains taxon percentages. So I do not know whether the co-occurrence value of taxa j and k in a sample is calculated from the taxon percentages or from taxon presence/absence in a sample. In the latter case the maximum number of co-occurrences is the number of samples in the training set
Details:
L32 I would like to have this conclusion to be separate from the comparison which is the main focus of the paper. I suggest to add “also” to the sentence or, in full,
“Apart from the comparison between machine learning and weighted averaging method for paleo reconstruction we also conclude …”
L24 “embedded assemblage data” first occurrence of embedding. I suggest to change the sentence to “the three MEMLM approaches performed… as judged by cross-validatory prediction error in the larger training data sets.
L29 “could fail badly” add : in the reconstruction??
L33 “cross-validation” Change in line with the line 24 change.
L61-63 The text, as I read it, suggests that “data mining” and “information extraction” are used here in the meaning of “supervised” and “unsupervised” learning. I wonder whether information extraction is not a misnomer (even if usual in the ML word). What about using the new term “representation learning” for unconstrained ordination/factor analysis?
L64 I do not think that the phrase “understand and analyse semantic information” makes sense. Semantics is about meaning, so that an aim can be to ‘extract/obtain semantic information by an analysis’. Please rephase and avoid the usage of the term semantic in ecological context as it is unclear what is supposed to mean (i.e. avoid terms that sound impressive but do not carry meaning for an ecologist).
L84-86 I do not know what are “dimensionally reduced (GloVe) assemble data” and what is “the more complex versions” [yes, the ones using GloVe, which has not yet been introduced]. Rephase.
Figure 1. Is it really impossible to change Raw num to Row num in the fig.?
L113 “develop the assemblage matrix” To me, the assemblage matrix is the same as the abundance matrix, which makes the sentence strange. Rephrase.
L131 It should be said explicitly that the multiple regression is applied to each of the five folds, so as to enable calculation of the cross-validatory prediction error (RMEP) without further analyses (I missed this/did not think about it in this way in the first version). Also, add that for any down-core application of the model, the model is recomputed for all data. And as all has been done five times, the total number of analyses is 5 (folds) x 5 (replications of the cross validation) + 1 (for the final model used for reconstruction). Is this the correct interpretation of what has been done?
L137 “the stacking approach” First appearance of stacking. Might be unclear. Rephrase or explain.
L139-140 Table A2 could be supplemented with the standard errors (or percentage error, if defined explicitly) of the coefficients based on the five folds (the root mean variance across the five replications).
L145. You give the one-dimensional form of the model here (copying from my first review). Either mention this explicitly, or extend to R_i^' C_j. A point that I did not find in Pennington et al 2014 is that a co-occurrence matrix is a symmetric matrix (although they describe/word it asymmetrically as “X_ij tabulate[s] the number of times a word j occurs in the context of word i”) so that R and C in the formula should be identical, shouldn’t they? So my question is: is your co-occurrence matrix symmetric? And what about the numerical values that you obtained? And, are the sample scores then linear combinations of the R or of the C value (if different).
L145. “least-squared fitted” -> fitted by weighed least-squares”
L146 “except”-> “except, perhaps,” See my notes under General.
L156 Here is the place to describe how you calculated co-occurrence from percentage abundance data.
L157 Delete “The objective … functioning.” as it carries little information relevant to paleo-reconstruction.
L170 This key sentence should be rephrased (see under General).
L170-171 Move to L147.
L148. Add, for example, “It may be helpful to describe the motivation for this particular row-column model.” [Lines 148-169 describe motivation for the row-column model; this is how Pennington et al. came up with this row-column model. Note that there are older but similar ways to motivate this model; it is particular attractive for strictly compositional data].
L173-177. I read here: GloVe dimensions emphasises meaning. Really? In my view, there is no contrast with unconstrained ordination. Please, delete.
L233-235 The infinitesimal jackknife requires a twice differential model (Extrinsic UQ Algorithms — uq360 0.1 documentation). Are decision tree models twice differential? (I presume not). A topic for future research is to try and validate this approach. (Birks et al. 1990) used a bootstrap approach.
L240 each [five-fold] cross-validation?
L255 Specify which variance. Now I have to reread Telford and Birks to find out. They write “proportion of variance in the fossil data explained by a single reconstruction” “estimated using” “redundancy analysis”. Add this info.
L282. “This sensitivity” Make more precise.
L 293 “percentage error” When I google percentage error I obtained a formula for a single estimate compared to the true value, e.g. Percent Error: Definition, Formula & Examples - Statistics By Jim. You have more values. Please define more precisely or give a reference that contains explicitly and clearly precisely what you used.
L294 “spliced abundance and embedding matrices.” Spliced? Embedding matrix is not easy to understand either.
Figure 2 and similar. “b, c & d) statistical significance” I see histograms and within it a p-value. Rephrase.
L402 encode Both?
L404. WA-PLS2 does not occur in the first part of the sentence. So why “though”? Rephrase.
L401 “fail badly” In which sense?
L401-2 shorten to: “Machine learning approaches trained with randomised environmental data yield left-skewed histograms, showing they explain little down-core variance as is natural (desired?) for randomized environmental data (Figs. 2–6)”.
L444 Add the github location or (preferred) give it a zenodo DOI, so that all is reproducible in principle.
Figure A1 I would say “y vs x” as vertical vs horizontal (ordinate vs abscissa)
Fig and legend are inconsistent in this sense.
Cajo ter Braak Wageningen July 1. 2024
Birks, H.J.B., Line, J.M., Juggins, S., Stevenson, A.C. & ter Braak, C.J.F. (1990) Diatoms and pH reconstruction. Philosophical Transactions of the Royal Society London, Series B, 327, 263-278.https://doi.org/10.1098/rstb.1990.0062
Jongman, R.H.G., ter Braak, C.J.F. & van Tongeren, O.F.R. (1995) Data analysis in community and landscape ecology. Cambridge University Press, Cambridge.0-521-47574-0
van der Voet, H. (1994) Comparing the predictive accuracy of models using a simple randomization test. Chemometrics and Intelligent Laboratory Systems, 25, 313-323 |