Summarizing Linear Model Fits for Multivariate Abundance Data

summary method for class "manylm" - computes a table summarising the statistical significance of different multivariate terms in a linear model fitted to high-dimensional data, such as multivariate abundance data in ecology.

Usage

# S3 method for manylm
summary(object, nBoot=999, resamp="residual", 
   test="F", cor.type=object$cor.type, block=NULL, shrink.param=NULL, 
   p.uni="none", studentize=TRUE, R2="h", show.cor = FALSE, 
   show.est=FALSE, show.residuals=FALSE, symbolic.cor=FALSE, 
   rep.seed=FALSE, tol=1.0e-6, ... )
  
# S3 method for summary.manylm
print(
   x, digits = max(getOption("digits") - 3, 3), 
   signif.stars=getOption("show.signif.stars"), 
   dig.tst=max(1, min(5, digits - 1)), 
   eps.Pvalue=.Machine$double.eps, ... )

Arguments

object: an object of class "manylm", usually, a result of a call to manylm.
nBoot: the number of Bootstrap iterations, default is nBoot=999.
resamp: the method of resampling used. Can be one of "case" (not yet available),"residual" (default), "perm.resid", "score" or "none". See Details.
test: the test to be used. Possible values are: "LR" = likelihood ratio statistic (default) and "F" = Lawley-Hotelling trace statistic.
Note that if all variables are assumed independent (cor.shrink="I") then "LR" is equivalent to LR-IND and "F" is the sum-of-F statistics from Warton & Hudson (2004).
cor.type: structure imposed on the estimated correlation matrix under the fitted model. Can be "I"(default), "shrink", or "R". See Details.
block: A factor specifying the sampling level to be resampled. Default is resampling rows.
shrink.param: shrinkage parameter to be used if cor.type="shrink". If not supplied, but needed, it will be estimated from the data by Cross Validation using the normal likelihood as in Warton (2008).
p.uni: whether to calculate univariate test statistics and their P-values, and if so, what type.
"none" = no univariate P-values (default)
"unadjusted" = a test statistic and (ordinary unadjusted) P-value is reported for each response variable.
"adjusted" = Univariate P-values are adjusted for multiple testing, using a step-down resampling procedure.
studentize: logical, whether studentized residuals or residuals should beused for simulation in the resampling steps. This option is not used in case resampling.
R2: the type of R^2 (correlation coefficient) that should be shown, can be one of:
"h" = Hooper's R^2 = tr(SST^(-1)SSR)/p
"v" = vector R^2 = det(SSR)/det(SST)
"n" = none
show.cor: logical, if TRUE, the correlation matrix of the estimated parameters is returned and printed.
show.est: logical. Whether to show the estimated model parameters.
show.residuals: logical. Whether to show residuals/a residual summary.
symbolic.cor: logical. If TRUE, print the correlations in a symbolic form rather than as numbers.
rep.seed: logical. Whether to fix random seed in resampling data. Useful for simulation or diagnostic purposes.
tol: the tolerance used in estimations.
x: an object of class "summary.manylm", usually, a result of a call to summary.manylm.
digits: the number of significant digits to use when printing.
signif.stars: logical. If TRUE, ‘significance stars’ are printed for each coefficient.
dig.tst: the number of digits to round the estimates of the model parameters.
eps.Pvalue: a numerical tolerance for the formatting of p values.
...: for summary.manyglm method, these are additional arguments including:
bootID - this matrix should be integer numbers where each row specifies bootstrap id's in each resampling run. When bootID is supplied, nBoot is set to the number of rows in bootID. Default is NULL.
for print.summary.manyglm method, these are optional further arguments passed to or from other methods. See print.summary.glm for more details.

Details

The summary.manylm function returns a table summarising the statistical significance of each multivariate term specified in the fitted manylm model. For each model term, it returns a test statistic as determined by the argument test, and a P-value calculated by resampling rows of the data using a method determined by the argument resamp. The four possible resampling methods are residual-permutation (Anderson and Robinson (2001)), score resampling (Wu (1986)), case and residual resampling (Davison and Hinkley (1997, chapter 6)), and involve resampling under the alternative hypothesis. These methods ensure approximately valid inference even when the correlation between variables has been misspecified, and for case and score resampling, even when the equal variance assumption of linear models is invalid. By default, studentized residuals (r_i/sqrt(1-h_ii)) are used in residual and score resampling, although raw residuals could be used via the argument studentize=FALSE. If resamp="none", p-values cannot be calculated, however the test statistics are returned.

If you have a specific hypothesis of primary interest that you want to test, then you should use the anova.manylm function, which can resample rows of the data under the null hypothesis and so usually achieves a better approximation to the true significance level.

To check model assumptions, use plot.manylm.

The summary.manylm function is designed specifically for high-dimensional data (that, is when the number of variables p is not small compared to the number of observations N). In such instances a correlation matrix is computationally intensive to estimate and is numerically unstable, so by default the test statistic is calculated assuming independence of variables (cor.type="I"). Note however that the resampling scheme used ensures that the P-values are approximately correct even when the independence assumption is not satisfied. However if it is computationally feasible for your dataset, it is recommended that you use cor.type="shrink" to account for correlation between variables, or cor.type="R" when p is small. The cor.type="R" option uses the unstructured correlation matrix (only possible when N>p), such that the standard classical multivariate test statistics are obtained. Note however that such statistics are typically numerically unstable and have low power when p is not small compared to N. The cor.type="shrink" option applies ridge regularisation (Warton 2008), shrinking the sample correlation matrix towards the identity, which improves its stability when p is not small compared to N. This provides a compromise between "R" and "I", allowing us to account for correlation between variables, while using a numerically stable test statistic that has good properties. The shrinkage parameter by default is estimated by cross-validation using the multivariate normal likelihood function, although it can be specified via shrink.param as any value between 0 and 1 (0="I" and 1="R", values closer towards 0 indicate more shrinkage towards "I"). The validation groups are chosen by random assignment and so you may observe some slight variation in the estimated shrinkage parameter in repeat analyses. See ridgeParamEst for more details.

Rather than stopping after testing for multivariate effects, it is often of interest to find out which response variables express significant effects. Univariate statistics are required to answer this question, and these are reported if requested. Setting p.uni="unadjusted" returns resampling-based univariate P-values for all effects as well as the multivariate P-values, whereas p.uni="adjusted" returns adjusted P-values (that have been adjusted for multiple testing), calculated using a step-down resampling algorithm as in Westfall & Young (1993, Algorithm 2.8). This method provides strong control of family-wise error rates, and makes use of resampling (using the method controlled by resample) to ensure inferences take into account correlation between variables.

A multivariate R^2 value is returned in output, but there are many ways to define a multivariate R^2. The type of R^2 used is controlled by the R2 argument. If cor.shrink="I" then all variables are assumed independent, a special case in which Hooper's R^2 returns the average of all univariate R^2 values, whereas the vector R^2 returns their product.

print.summary.manylm tries to be smart about formatting the coefficients, genVar, etc. and additionally gives ‘significance stars’ if signif.stars is TRUE.

Value

summary.manylm returns an object of class "summary.manyglm", a list with components

call: the component from object.
terms: the terms object used.
show.residuals: the supplied argument.
show.est: the supplied argument.
p.uni: the supplied argument.
test: the supplied argument.
cor.type: the supplied argument.
resample: the supplied argument.
nBoot: the supplied argument.
rankX: the rank of the design matrix
residuals: the model residuals
genVar: the estimated generalised variance
est: the estimated model coefficients
shrink.param: the shrinkage parameter. Either the value of the argument with the same name or if this was not supplied the estimated shrinkage parameter.
aliased: named logical vector showing if the original coefficients are aliased.
df: a 3-vector of the rank of the model and the number of residual degrees of freedom, plus number of non-aliased coefficients.

If the argument test is not NULL then the list also included the components

coefficients: a matrix containing the test statistics and the p-values.
n.iter.sing: the number of iterations that were skipped due to singularity of the design matrix caused by case resampling.

If furthermore the Design matrix is neither empty nor consists of the Intercept only, the following adddional components are included:

r.squared: the calculated correlation coefficient.
R2: a character that describes which type of correlation coefficient was calculated.
statistic: a matrix containing the results of the overall test.
cov.unscaled: the unscaled (dispersion = 1) estimated covariance matrix of the estimated coefficients.

If the argument show.cor is TRUE the following adddional components are returned:

correlation: the (p*q) by (p*q) correlation matrix, with p being the number of columns of the design matrix and q being the number of response variables. Note that this matrix can be very big.

References

Anderson, M.J. and J. Robinson (2001). Permutation tests for linear models. Australian and New Zealand Journal of Statistics 43, 75-88.

Davison, A. C. and Hinkley, D. V. (1997) Bootstrap Methods and their Application. Cambridge University Press, Cambridge.

Warton D.I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association 103, 340-349.

Warton D.I. and Hudson H.M. (2004). A MANOVA statistic is just as powerful as distance-based statistics, for multivariate abundances. Ecology 85(3), 858-874.

Westfall, P. H. and Young, S. S. (1993) Resampling-based multiple testing. John Wiley & Sons, New York.

Wu, C. F. J. (1986) Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis. The Annals of Statistics 14:4, 1261-1295.

Author

Yi Wang, Ulrike Naumann and David Warton <David.Warton@unsw.edu.au>.

Examples

data(spider)
spiddat <- log(spider$abund+1)

## Estimate the coefficients of a multivariate linear model:
fit <- manylm(spiddat~., data=spider$x)

## To summarise this multivariate fit, using score resampling to
## and F Test statistic to estimate significance:
summary(fit, resamp="score", test="F")
#> 
#> Test statistics:
#>               F value Pr(>F)  
#> (Intercept)     32.27  0.163  
#> soil.dry        83.07  0.011 *
#> bare.sand       24.91  0.390  
#> fallen.leaves   41.68  0.029 *
#> moss            30.43  0.204  
#> herb.layer      20.98  0.174  
#> reflection      23.77  0.237  
#> --- 
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
#> Arguments: with 999 resampling iterations using score resampling and response assumed to be uncorrelated 
#> 
#> Hooper's R-squared: 0.7173 
#> 
#> Lawley-Hotelling trace statistic:  124.7, p-value: 0.001 
#> Arguments: with 999 resampling iterations using score resampling and response assumed to be uncorrelated 
#> 

## Instead using residual permutation with 2000 iteration, using the sum of F 
## statistics to estimate multivariate significance, but also reporting 
## univariate statistics with adjusted P-values:
summary(fit, resamp="perm.resid", nBoot=2000, test="F", p.uni="adjusted")
#> 
#> Test statistics:
#>               F value Pr(>F)   
#> (Intercept)     32.27 0.0395 * 
#> soil.dry        83.07 0.0015 **
#> bare.sand       24.91 0.1009   
#> fallen.leaves   41.68 0.0140 * 
#> moss            30.43 0.0430 * 
#> herb.layer      20.98 0.1449   
#> reflection      23.77 0.1154   
#> --- 
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
#> 
#> Univariate test statistic: 
#>          (Intercept)        soil.dry        bare.sand        fallen.leaves
#>              F value Pr(>F)  F value Pr(>F)   F value Pr(>F)       F value
#> Alopacce       1.131 0.7851    6.543 0.1059     1.419 0.7631         0.487
#> Alopcune       0.700 0.7851    2.868 0.1999     1.114 0.7906         0.064
#> Alopfabr       5.872 0.1894   11.020 0.0425     7.033 0.1299         0.127
#> Arctlute       1.343 0.7851    7.648 0.0845     2.752 0.5217         3.780
#> Arctperi       5.261 0.2214    6.290 0.1059     1.075 0.7906         0.191
#> Auloalbi       0.515 0.7851    1.918 0.1999     0.079 0.9550         1.448
#> Pardlugu      10.792 0.0330    6.241 0.1059     4.262 0.3293         2.750
#> Pardmont       4.406 0.2654    8.327 0.0835     0.003 0.9550         2.370
#> Pardnigr       0.076 0.9330    8.079 0.0835     0.587 0.8396        11.088
#> Pardpull       0.946 0.7851    9.280 0.0640     0.349 0.8956         8.596
#> Trocterr       1.224 0.7851    6.502 0.1059     1.496 0.7631         2.265
#> Zoraspin       0.003 0.9570    8.357 0.0835     4.738 0.2864         8.510
#>                    moss        herb.layer        reflection       
#>          Pr(>F) F value Pr(>F)    F value Pr(>F)    F value Pr(>F)
#> Alopacce 0.9225   0.551 0.8701      2.015 0.7026      5.673 0.2079
#> Alopcune 0.9690   1.248 0.8016      0.469 0.9410      2.000 0.6642
#> Alopfabr 0.9690   0.027 0.9110      0.093 0.9750      0.081 0.9930
#> Arctlute 0.4128   0.143 0.9110      0.152 0.9750      0.347 0.9810
#> Arctperi 0.9690   2.192 0.6612      2.085 0.7026      2.179 0.6642
#> Auloalbi 0.7131   1.717 0.7081      5.406 0.2319      0.032 0.9930
#> Pardlugu 0.5622   2.355 0.6612      0.116 0.9750      2.845 0.5697
#> Pardmont 0.5742   9.967 0.0425      1.868 0.7026      0.325 0.9810
#> Pardnigr 0.0305   4.031 0.3818      1.598 0.7026      0.712 0.9165
#> Pardpull 0.0765   0.807 0.8701      1.987 0.7026      0.007 0.9930
#> Trocterr 0.5742   6.618 0.1539      2.918 0.5627      0.845 0.9095
#> Zoraspin 0.0765   0.771 0.8701      2.277 0.6847      8.726 0.0755
#> 
#> Arguments: with 2000 resampling iterations using residual (without replacement) resampling and response assumed to be uncorrelated 
#> 
#> Hooper's R-squared: 0.7173 
#> 
#> Lawley-Hotelling trace statistic:  124.7, p-value: 5e-04 
#> 
#> Univariate test statistic: 
#>         Alopacce Alopcune Alopfabr Arctlute Arctperi Auloalbi Pardlugu Pardmont
#> F value   14.369    3.837   16.461    3.508    9.450    6.168   10.508    9.867
#> Pr(>F)    0.0010   0.0245   0.0010   0.0245   0.0020   0.0040   0.0015   0.0020
#>         Pardnigr Pardpull Trocterr Zoraspin
#> F value   10.468   11.457   18.450   10.142
#> Pr(>F)    0.0015   0.0015   0.0010   0.0020
#> 
#> Arguments: with 2000 resampling iterations using residual (without replacement) resampling and response assumed to be uncorrelated 
#> 

## Obtain a summary of test statistics using residual resampling, accounting 
## for correlation between variables but shrinking the correlation matrix to 
## improve its stability and showing univariate p-values:
summary(fit, cor.type="shrink", p.uni="adjusted")
#> 
#> Test statistics:
#>               F value Pr(>F)    
#> (Intercept)     32.21  0.027 *  
#> soil.dry        74.97  0.001 ***
#> bare.sand       24.93  0.070 .  
#> fallen.leaves   35.70  0.024 *  
#> moss            27.69  0.047 *  
#> herb.layer      17.79  0.183    
#> reflection      23.61  0.073 .  
#> --- 
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
#> 
#> Univariate test statistic: 
#>          (Intercept)        soil.dry        bare.sand        fallen.leaves
#>              F value Pr(>F)  F value Pr(>F)   F value Pr(>F)       F value
#> Alopacce       1.131  0.766    6.543  0.087     1.419  0.749         0.487
#> Alopcune       0.700  0.766    2.868  0.174     1.114  0.773         0.064
#> Alopfabr       5.872  0.158   11.020  0.030     7.033  0.124         0.127
#> Arctlute       1.343  0.766    7.648  0.071     2.752  0.507         3.780
#> Arctperi       5.261  0.182    6.290  0.087     1.075  0.773         0.191
#> Auloalbi       0.515  0.766    1.918  0.183     0.079  0.946         1.448
#> Pardlugu      10.792  0.034    6.241  0.087     4.262  0.305         2.750
#> Pardmont       4.406  0.246    8.327  0.071     0.003  0.958         2.370
#> Pardnigr       0.076  0.921    8.079  0.071     0.587  0.852        11.088
#> Pardpull       0.946  0.766    9.280  0.053     0.349  0.891         8.596
#> Trocterr       1.224  0.766    6.502  0.087     1.496  0.742         2.265
#> Zoraspin       0.003  0.957    8.357  0.071     4.738  0.271         8.510
#>                    moss        herb.layer        reflection       
#>          Pr(>F) F value Pr(>F)    F value Pr(>F)    F value Pr(>F)
#> Alopacce  0.924   0.551  0.874      2.015  0.696      5.673  0.198
#> Alopcune  0.965   1.248  0.778      0.469  0.921      2.000  0.673
#> Alopfabr  0.965   0.027  0.917      0.093  0.981      0.081  0.980
#> Arctlute  0.424   0.143  0.917      0.152  0.981      0.347  0.973
#> Arctperi  0.965   2.192  0.661      2.085  0.696      2.179  0.673
#> Auloalbi  0.707   1.717  0.697      5.406  0.238      0.032  0.980
#> Pardlugu  0.568   2.355  0.661      0.116  0.981      2.845  0.571
#> Pardmont  0.578   9.967  0.046      1.868  0.696      0.325  0.973
#> Pardnigr  0.039   4.031  0.380      1.598  0.696      0.712  0.914
#> Pardpull  0.079   0.807  0.874      1.987  0.696      0.007  0.980
#> Trocterr  0.578   6.618  0.149      2.918  0.559      0.845  0.912
#> Zoraspin  0.079   0.771  0.874      2.277  0.672      8.726  0.067
#> 
#> Arguments: with 999 resampling iterations using residual resampling and correlation matrix shrunk by parameter 0.41 
#> 
#> Hooper's R-squared: 0.7173 
#> 
#> Lawley-Hotelling trace statistic:  113.1, p-value: 0.001 
#> 
#> Univariate test statistic: 
#>         Alopacce Alopcune Alopfabr Arctlute Arctperi Auloalbi Pardlugu Pardmont
#> F value   14.369    3.837   16.461    3.508    9.450    6.168   10.508    9.867
#> Pr(>F)     0.001    0.022    0.001    0.022    0.002    0.004    0.002    0.002
#>         Pardnigr Pardpull Trocterr Zoraspin
#> F value   10.468   11.457   18.450   10.142
#> Pr(>F)     0.002    0.002    0.001    0.002
#> 
#> Arguments: with 999 resampling iterations using residual resampling and correlation matrix shrunk by parameter 0.41 
#>