ANOVA for Linear Model Fits for Multivariate Abundance Data

anova method for class "manylm" - computes an analysis of variance table for one or more linear model fits to high-dimensional data, such as multivariate abundance data in ecology.

Usage

# S3 method for manylm
anova(object, ..., resamp="perm.resid", test="F", p.uni="none",
        nBoot=999, cor.type=object$cor.type,
        block=NULL, shrink.param=object$shrink.param, 
  studentize=TRUE, calc.rss = FALSE, tol=1.0e-10, rep.seed=FALSE, bootID=NULL )
# S3 method for anova.manylm
print(
    x, digits = max(getOption("digits") - 3, 3),
    signif.stars = getOption("show.signif.stars"),
    dig.tst = max(1, min(5, digits - 1)),
    eps.Pvalue = .Machine$double.eps, ...)

Arguments

object: object of class manylm, usually, a result of a call to manylm.
...: for the anova.manylm method, these are optional further objects of class manylm, which are usually a result of a call to manylm. for the print.anova.manylm method these are optional further arguments passed to or from other methods.
nBoot: the number of iterations in resampling. Default is 999 for P-values as fractions out of 1000.
resamp: the method of resampling used. Can be one of "perm.resid" (default), "residual", "score", "case". See Details.
test: the test to be used. Possible values are: "LR" = likelihood ratio statistic, "F" = Lawley-Hotelling trace statistic or NULL for no test.
cor.type: structure imposed on the estimated correlation matrix under the fitted model. Can be "I"(default), "shrink", or "R". See Details.
block: A factor specifying the sampling level to be resampled. Default is resampling rows.
shrink.param: shrinkage parameter to be used if cor.type="shrink". If not supplied, but needed, it will be estimated by estimated from the data by Cross Validation using the normal likelihood as in Warton (2008).
p.uni: whether to calculate univariate test statistics and their P-values, and if so, what type.
"none" = no univariate P-values (default)
"unadjusted" = A test statistic and (ordinary unadjusted) P-value is reported for each response variable.
"adjusted" = Univariate P-values are adjusted for multiple testing, using a step-down resampling procedure.
studentize: logical. Whether studentized residuals should be used to simulate the data in the resampling steps. This option is not used in case resampling.
calc.rss: logical. Whether the Residual Sum of Squares should be calculated.
tol: the sensitivity in calculations near 0.
rep.seed: logical. Whether to fix random seed in resampling data. Useful for simulation or diagnostic purposes.
bootID: an integer matrix where each row specifies bootstrap id's in each resampling run. When bootID is supplied, nBoot is set to the number of rows in bootID. Default is NULL.
x: an object of class "anova.manylm", usually, a result of a call to anova.manylm.
digits: the number of significant digits to use when printing.
signif.stars: logical. If TRUE, ‘significance stars’ are printed for each coefficient.
dig.tst: the number of digits to round the estimates of the model parameters.
eps.Pvalue: a numerical tolerance for the formatting of p values.

Details

The anova.manylm function returns a table summarising the statistical significance of a fitted manylm model, or of the differences between several nested models fitted to the same dataset. If one model is specified, sequential test statistics (and P values) are returned for that fit. If more than one object is specified, the table contains test statistics (and P values) comparing their fits.

The test statistics are determined by the argument test, and the P-values are calculated by resampling rows of the data using a method determined by the argument resamp. The four possible resampling methods are residual-permutation (Anderson and Robinson (2001)), score resampling (Wu (1986)), case and residual resampling (Davison and Hinkley (1997, chapter 6)), and involve resampling under the null hypothesis (except for case resampling). These methods ensure approximately valid inference even when the correlation between variables has been misspecified, and for case and score resampling, even when the equal variance assumption of linear models is invalid. By default, studentised residuals (r_i/sqrt(1-h_ii)) are used in residual and score resampling, although raw residuals could be used via the argument studentize=FALSE. If resamp="none", p-values cannot be calculated, however the test statistics are returned.

If you do not have a specific hypothesis of primary interest that you want to test, and are instead interested in which model terms are statistically significant, then the summary.manylm function is more appropriate.

More than one object should only be specified when the models are nested. In this case the ANOVA table has a column for the residual degrees of freedom and a column for change in degrees of freedom. It is conventional to list the models from smallest to largest, but this is up to the user.

To check model assumptions, use plot.manylm.

The anova.manylm function is designed specifically for high-dimensional data (that, is when the number of variables p is not small compared to the number of observations N). In such instances a correlation matrix is computationally intensive to estimate and is numerically unstable, so by default the test statistic is calculated assuming independence of variables (cor.type="I"). Note however that the resampling scheme used ensures that the P-values are approximately correct even when the independence assumption is not satisfied. However if it is computationally feasible for your dataset, it is recommended that you use cor.type="shrink" to account for correlation between variables, or cor.type="R" when p is small. The cor.type="R" option uses the unstructured correlation matrix (only possible when N>p), such that the standard classical multivariate test statistics are obtained. Note however that such statistics are typically numerically unstable and have low power when p is not small compared to N. The cor.type="shrink" option applies ridge regularisation (Warton 2008), shrinking the sample correlation matrix towards the identity, which improves its stability when p is not small compared to N. This provides a compromise between "R" and "I", allowing us to account for correlation between variables, while using a numerically stable test statistic that has good properties. The shrinkage parameter by default is estimated by cross-validation using the multivariate normal likelihood function, although it can be specified via shrink.param as any value between 0 and 1 (0="I" and 1="R", values closer towards 0 indicate more shrinkage towards "I"). The validation groups are chosen by random assignment and so you may observe some slight variation in the estimated shrinkage parameter in repeat analyses. See ridgeParamEst for more details.

Rather than stopping after testing for multivariate effects, it is often of interest to find out which response variables express significant effects. Univariate statistics are required to answer this question, and these are reported if requested. Setting p.uni="unadjusted" returns the resampling-based univariate ANOVA P-values as well as the multivariate P-values, whereas p.uni="adjusted" returns adjusted ANOVA P-values (that have been adjusted for multiple testing), calculated using a step-down resampling algorithm as in Westfall & Young (1993, Algorithm 2.8). This method provides strong control of family-wise error rates, and makes use of resampling (using the method controlled by resampling) to ensure inferences take into account correlation between variables.

Value

An object of class "anova.manylm". A list containing at least:

p.uni: the supplied argument.
test: the supplied argument.
cor.type: the supplied argument.
resample: the supplied argument.
nBoot: the supplied argument.
calc.rss: the supplied argument.
table: the data frame containing the anova table.
shrink.param: the supplied argument.
n.bootsdone: the number of bootstrapping iterations that were done, i.e. had no error.
n.iter.sing: the number of bootstrap iterations where the resampled design matrix was singular and could only be used partly.
one: logical. whether the anova table was calculated for one manylm object or for several manylm objects.

If p.uni="adjusted" or p.uni="unadjusted" the output list also contains:

uni.test: a table showing the test statistics of the univariate tests
uni.p: a table showing the p-values of the univariate tests

The print method for anova.manylm objects prints the output in a ‘pretty’ form.

References

Anderson, M.J. and J. Robinson (2001). Permutation tests for linear models. Australian and New Zealand Journal of Statistics 43, 75-88.

Davison, A. C. and Hinkley, D. V. (1997) Bootstrap Methods and their Application. Cambridge University Press, Cambridge.

Warton D.I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association 103, 340-349.

Warton D.I. and Hudson H.M. (2004). A MANOVA statistic is just as powerful as distance-based statistics, for multivariate abundances. Ecology 85(3), 858-874.

Westfall, P. H. and Young, S. S. (1993) Resampling-based multiple testing. John Wiley & Sons, New York.

Wu, C. F. J. (1986) Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis. The Annals of Statistics 14:4, 1261-1295.

Author

Yi Wang, Ulrike Naumann and David Warton <David.Warton@unsw.edu.au>.

Examples

## Load the spider dataset:
data(spider)
spiddat <- log(spider$abund+1)
spiddat <- mvabund(spiddat)
spidx <- as.matrix(spider$x)

## Fit several multivariate linear models:
fit <- manylm( spiddat ~ spidx ) # model with all explanatory variables

## Use the default residual resampling to test the significance of this model:
## return summary of the manylm model
anova(fit)
#> Analysis of Variance Table
#> 
#> Model: manylm(formula = spiddat ~ spidx)
#> 
#> Overall test for all response variables
#> Test statistics:
#>             Res.Df Df.diff val(F) Pr(>F)   
#> (Intercept)     27                         
#> spidx           21       6  124.7  0.002 **
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> Arguments:
#>  Test statistics calculated assuming uncorrelated response (for faster computation) 
#>  P-value calculated using 999 iterations via residual (without replacement) resampling.
#> 

## intercept model
fit0 <- manylm(spiddat ~ 1)
## include soil and leaf variables
fit1 <- update(fit0, . ~ . + spidx[, c(1, 3)])
## include moss variables
fit2 <- update(fit1, . ~ . + spidx[, 4]) 

## Use (residual) resampling to test the significance of these models, 
## accounting for correlation between variables by shrinking 
## the correlation matrix to improve its stability:
anova(fit, fit0, fit1, fit2, cor.type="shrink")
#> Analysis of Variance Table
#> 
#> fit0: spiddat ~ 1
#> fit1: spiddat ~ spidx[, c(1, 3)]
#> fit2: spiddat ~ spidx[, c(1, 3)] + spidx[, 4]
#> fit: spiddat ~ spidx
#> 
#> Overall test for all response variables
#> Test statistics:
#>      Res.Df Df.diff val(F) Pr(>F)   
#> fit0     27                         
#> fit1     25       2 233.06  0.002 **
#> fit2     24       1  40.74  0.016 * 
#> fit      21       3  22.61  0.043 * 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> Arguments:
#>  Test statistics calculated assuming correlated response via ridge regularization with ridge parameter 0.358 
#>  P-value calculated using 999 iterations via residual (without replacement) resampling.
#> 

## Use the sum of F statistics to estimate multivariate significance from 
## 4999 resamples, and also reporting univariate statistics with 
## adjusted P-values:
anova(fit, fit0, fit1, fit2, nBoot=4999, test="F", p.uni="adjust")
#> Analysis of Variance Table
#> 
#> fit0: spiddat ~ 1
#> fit1: spiddat ~ spidx[, c(1, 3)]
#> fit2: spiddat ~ spidx[, c(1, 3)] + spidx[, 4]
#> fit: spiddat ~ spidx
#> 
#> Overall test for all response variables
#> Test statistics:
#>      Res.Df Df.diff val(F) Pr(>F)    
#> fit0     27                          
#> fit1     25       2 253.06 0.0004 ***
#> fit2     24       1  44.33 0.0100 ** 
#> fit      21       3  23.49 0.0462 *  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Univariate Tests
#> Test statistics:
#>      Alopacce        Alopcune        Alopfabr        Arctlute        Arctperi
#>       F value Pr(>F)  F value Pr(>F)  F value Pr(>F)  F value Pr(>F)  F value
#> fit0                                                                         
#> fit1   27.921 0.0004    9.204 0.0026    39.04 0.0004    8.781 0.0026   21.853
#> fit2     1.56 0.7470    0.564 0.7470    0.312 0.7470     0.87 0.7470    1.113
#> fit     3.377 0.3046    1.257 0.7792    2.558 0.4678    0.945 0.8600    2.006
#>             Auloalbi        Pardlugu        Pardmont        Pardnigr       
#>      Pr(>F)  F value Pr(>F)  F value Pr(>F)  F value Pr(>F)  F value Pr(>F)
#> fit0                                                                       
#> fit1 0.0006   11.492 0.0014   19.024 0.0006   16.408 0.0006   21.359 0.0006
#> fit2 0.7470    3.416 0.4344    2.738 0.5290   11.386 0.0272     7.98 0.0790
#> fit  0.5930    1.819 0.5930    2.971 0.3864    0.841 0.8600     0.74 0.8600
#>      Pardpull        Trocterr        Zoraspin       
#>       F value Pr(>F)  F value Pr(>F)  F value Pr(>F)
#> fit0                                                
#> fit1   32.306 0.0004   31.906 0.0004   13.764 0.0012
#> fit2    1.343 0.7470    8.789 0.0640    4.262 0.3360
#> fit     0.903 0.8600    2.045 0.5930    4.027 0.2062
#> 
#> Arguments: with 4999 resampling iterations using residual (without replacement) resampling and uncorrelated response (for faster computation) 
#>