Summarizing Multivariate Generalized Linear Model Fits for Abundance Data

summary method for class "manyglm".

Usage

# S3 method for manyglm
summary(object, resamp="pit.trap", test="wald", 
        p.uni="none", nBoot=999, cor.type=object$cor.type, block=NULL,
        show.cor = FALSE, show.est=FALSE, show.residuals=FALSE,
        symbolic.cor = FALSE,
        rep.seed = FALSE, 
        show.time=FALSE, show.warning=FALSE,...) 
# S3 method for summary.manyglm
print(x, ...)

Arguments

object: an object of class manyglm, typically the result of a call to manyglm.
resamp: the method of resampling used. Can be one of "case", "perm.resid", "montecarlo" or "pit.trap" (default). See Details.
test: the test to be used. If cor.type="I", this can be one of "wald" for a Wald-Test or "score" for a Score-Test or "LR" for a Likelihood-Ratio-Test, otherwise only "wald" and "score" is allowed. The default value is "LR".
p.uni: whether to calculate univariate test statistics and their P-values, and if so, what type. This can be one of the following options.
"none" = No univariate P-values (default)
"unadjusted" = A test statistic and (ordinary unadjusted) P-value are reported for each response variable.
"adjusted" = Univariate P-values are adjusted for multiple testing, using a step-down resampling procedure.
nBoot: the number of Bootstrap iterations, default is nBoot=999.
cor.type: structure imposed on the estimated correlation matrix under the fitted model. Can be "I"(default), "shrink", or "R". See Details.
block: A factor specifying the sampling level to be resampled. Default is resampling rows.
show.cor, show.est, show.residuals: logical, if TRUE, the correlation matrix of the estimated parameters, or the estimated model parameters, or the residual summary is shown.
symbolic.cor: logical. If TRUE, the correlation is printed in a symbolic form (see symnum ) rather than in numerical format.
rep.seed: logical. Whether to fix random seed in resampling data. Useful for simulation or diagnostic purposes.
show.time: Whether to display timing information for the resampling procedure: "none" shows none, "all" shows all timing information and "total" shows only the overall time taken for the tests.
show.warning: logical. Whether to display warnings in the operation procedure.
...: for summary.manyglm method, these are additional arguments including:
rep.seed - logical. Whether to fix random seed in resampling data. Useful for simulation or diagnostic purposes.
bootID - this matrix should be integer numbers where each row specifies bootstrap id's in each resampling run. When bootID is supplied, nBoot is set to the number of rows in bootID. Default is NULL.
for print.summary.manyglm method, these are optional further arguments passed to or from other methods. See print.summary.glm for more details.
x: an object of class "summary.manyglm", usually, a result of a call to summary.manyglm.

Details

The summary.manyglm function returns a table summarising the statistical significance of each multivariate term specified in the fitted manyglm model (Warton 2011). For each model term, it returns a test statistic as determined by the argument test, and a P-value calculated by resampling rows of the data using a method determined by the argument resamp. Of the four possible resampling methods, three (case, residual permutation and parametric boostrap) are described in more detail in Davison and Hinkley (1997, chapter 6), but the default (PIT-trap) is a new method (in review) which bootstraps probability integral transform residuals, and which we have found to give the most reliable Type I error rates. All methods involve resampling under the alternative hypothesis. These methods ensure approximately valid inference even when the mean-variance relationship or the correlation between variables has been misspecified. Standardized pearson residuals (see manyglm are currently used in residual permutation, and where necessary, resampled response values are truncated so that they fall in the required range (e.g. counts cannot be negative). However, this can introduce bias, especially for family=binomial, so we advise extreme caution using perm.resid for presence/absence data. If resamp="none", p-values cannot be calculated, however the test statistics are returned.

If you have a specific hypothesis of primary interest that you want to test, then you should use the anova.manyglm function, which can resample rows of the data under the null hypothesis and so usually achieves a better approximation to the true significance level.

For information on the different types of data that can be modelled using manyglm, see manyglm. To check model assumptions, use plot.manyglm.

Multivariate test statistics are constructed using one of three methods: a log-likelihood ratio statistic test="LR", for example as in Warton et. al. (2012), or a Wald statistic test="wald" or a Score statistic test="score". "LR" has good properties, but is only available when cor.type="I".

The default Wald test statistic makes use of a generalised estimating equations (GEE) approach, estimating the covariance matrix of parameter estimates using a sandwich-type estimator that assumes the mean-variance relationship in the data is correctly specified and that there is an unknown but constant correlation across all observations. Such assumptions allow the test statistic to account for correlation between variables but to do so in a more efficient way than traditional GEE sandwich estimators (Warton 2008a). The common correlation matrix is estimated from standardized Pearson residuals, and the method specified by cor.type is used to adjust for high dimensionality.

The Wald statistic has problems for count data and presence-absence data when there are estimated means at zero (which usually means very large parameter estimates, check for this using coef). In such instances Wald statistics should not be used, Score or LR should do the job.

The summary.manyglm function is designed specifically for high-dimensional data (that, is when the number of variables p is not small compared to the number of observations N). In such instances a correlation matrix is computationally intensive to estimate and is numerically unstable, so by default the test statistic is calculated assuming independence of variables (cor.type="I"). Note however that the resampling scheme used ensures that the P-values are approximately correct even when the independence assumption is not satisfied. However if it is computationally feasible for your dataset, it is recommended that you use cor.type="shrink" to account for correlation between variables, or cor.type="R" when p is small. The cor.type="R" option uses the unstructured correlation matrix (only possible when N>p), such that the standard classical multivariate test statistics are obtained. Note however that such statistics are typically numerically unstable and have low power when p is not small compared to N.

The cor.type="shrink" option applies ridge regularisation (Warton (2008b)), shrinking the sample correlation matrix towards the identity, which improves its stability when p is not small compared to N. This provides a compromise between "R" and "I", allowing us to account for correlation between variables, while using a numerically stable test statistic that has good properties.

The shrinkage parameter is an attribute of the manyglm object. For a Wald test, the sample correlation matrix of the alternative model is used to calculate the test statistics. So object$shrink.param is used. For a Score test, the sample correlation matrix of the null model is used to calculate the test statistics. So shrink.param of the null model is used instead. If cor.type=="shrink" but object$shrink.param is not available, for example object$cor.type!="shrink", then the shrinkage parameter will be estimated by cross-validation using the multivariate normal likelihood function (see ridgeParamEst and (Warton 2008b)) in the summary test.

Rather than stopping after testing for multivariate effects, it is often of interest to find out which response variables express significant effects. Univariate statistics are required to answer this question, and these are reported if requested. Setting p.uni="unadjusted" returns resampling-based univariate P-values for all effects as well as the multivariate P-values, whereas p.uni="adjusted" returns adjusted P-values (that have been adjusted for multiple testing), calculated using a step-down resampling algorithm as in Westfall & Young (1993, Algorithm 2.8). This method provides strong control of family-wise error rates, and makes use of resampling (using the method controlled by resamp) to ensure inferences take into account correlation between variables.

Value

summary.manyglm returns an object of class "summary.manyglm", a list with components

call: the component from object.
terms: the terms object used.
family: the component from object.
deviance: the component from object.

aic: Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of coefficients (except for negative binomial and quasipoisson family, assuming that the dispersion is known).
df.residual: the component from object.
null.deviance: the component from object.
df.null: the component from object.
devll: minus twice the maximized log-likelihood
iter: the number of iterations that were used in manyglm for the estimation of the model parameters.
p.uni: the supplied argument.
nBoot: the supplied argument.
resample: the supplied argument.
na.action: the na.action used in the manyglm object, if applicable
show.residuals: the supplied argument.
show.est: the supplied argument.
compositional: logical. Whether a test for compositional effects was performed.

test: the supplied argument.
cor.type: the supplied argument.
method: the method used in manyglm. Either "glm.fit" or "manyglm.fit"
theta.method: the method used for the estimation of the nuisance parameter theta.
manyglm.args: a list of control parameters from manyglm.
rankX: the rank of the design matrix.
covstat: the supplied argument.
deviance.resid: the deviance residuals.
est: the estimated model coefficients
s.err: the Scaled Variance
shrink.param: the shrinkage parameter. Either the value of the argument with the same name or if this was not supplied the estimated shrinkage parameter.
n.bootsdone: the number of bootstrapping iterations that were done, i.e. had no error.
coefficients: the matrix of coefficients, standard errors, z-values and p-values. Aliased coefficients are omitted.
stat.iter: if the argument stat.iter is set to TRUE the test statistics in the resampling iterations.
statj.iter: if the argument stat.iter is set to TRUE the univariate test statistics in the resampling iterations.
aliased: named logical vector showing if the original coefficients are aliased.
dispersion: either the supplied argument or the inferred/estimated dispersion if the latter is NULL.
df: a 3-vector of the rank of the model and the number of residual degrees of freedom, plus number of non-aliased coefficients.
overall.n.bootsdone: the number of bootstrap iterations without errors that were done in the overall test
statistic: a table containing test statistics, p values and degrees of freedom for the overall test
overall.stat.iter: if the argument stat.iter is set to TRUE the test statistics of the overall tests in the resampling iterations.
overall.statj.iter: if the argument stat.iter is set to TRUE the univariate test statistics of the overall tests in the resampling iterations.
cov.unscaled: the unscaled (dispersion = 1) estimated covariance matrix of the estimated coefficients.
cov.scaled: ditto, scaled by dispersion.
correlation: (only if the argument show.cor = TRUE.) The estimated correlations of the estimated coefficients.
symbolic.cor: (only if show.cor = TRUE.) The value of the argument symbolic.cor.

References

Warton D.I. (2011). Regularized sandwich estimators for analysis of high dimensional data using generalized estimating equations. Biometrics, 67(1), 116-123.

Warton D.I. (2008a). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association 103, 340-349.

Warton D.I. (2008b). Which Wald statistic? Choosing a parameterisation of the Wald statistic to maximise power in k-sample generalised estimating equations. Journal of Statistical Planning and Inference, 138, 3269-3282.

Warton D. I., Wright S., and Wang, Y. (2012). Distance-based multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution, 3(1), 89-101.

Davison, A. C. and Hinkley, D. V. (1997) Bootstrap Methods and their Application, Cambridge University Press, Cambridge.

Westfall, P. H. and Young, S. S. (1993) Resampling-based multiple testing. John Wiley & Sons, New York.

Wu, C. F. J. (1986) Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis. The Annals of Statistics 14:4, 1261-1295.

Author

Yi Wang, David Warton <David.Warton@unsw.edu.au> and Ulrike Naumann.

Examples

data(spider)
spiddat <- mvabund(spider$abund)

## Estimate the coefficients of a multivariate glm
glm.spid <- manyglm(spiddat[,1:3]~., data=spider$x, family="negative.binomial")

## Estimate the statistical significance of different multivariate terms in 
## the model, using the default settings of LR test, and 100 PIT-trap resamples
summary(glm.spid, show.time=TRUE) 
#> 	Resampling run 0 finished. Time elapsed: 0.00 min ...
#> 	Resampling run 100 finished. Time elapsed: 0.00 min ...
#> 	Resampling run 200 finished. Time elapsed: 0.00 min ...
#> 	Resampling run 300 finished. Time elapsed: 0.01 min ...
#> 	Resampling run 400 finished. Time elapsed: 0.01 min ...
#> 	Resampling run 500 finished. Time elapsed: 0.01 min ...
#> 	Resampling run 600 finished. Time elapsed: 0.01 min ...
#> 	Resampling run 700 finished. Time elapsed: 0.02 min ...
#> 	Resampling run 800 finished. Time elapsed: 0.02 min ...
#> 	Resampling run 900 finished. Time elapsed: 0.02 min ...
#> Time elapsed: 0 hr 0 min 1 sec
#> 
#> Test statistics:
#>               wald value Pr(>wald)  
#> (Intercept)        2.378     0.363  
#> soil.dry           3.782     0.088 .
#> bare.sand          3.452     0.070 .
#> fallen.leaves      1.630     0.360  
#> moss               1.684     0.643  
#> herb.layer         3.966     0.048 *
#> reflection         3.641     0.077 .
#> --- 
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
#> 
#> Test statistic:  11.29, p-value: 0.001 
#> Arguments:
#>  Test statistics calculated assuming response assumed to be uncorrelated 
#>  P-value calculated using 999 resampling iterations via pit.trap resampling (to account for correlation in testing).
#> 

## Repeat with the parametric bootstrap and wald statistics 
summary(glm.spid, resamp="monte.carlo", test="wald", nBoot=300) 
#> 
#> Test statistics:
#>               wald value Pr(>wald)   
#> (Intercept)        2.378   0.10963   
#> soil.dry           3.782   0.00664 **
#> bare.sand          3.452   0.00332 **
#> fallen.leaves      1.630   0.31229   
#> moss               1.684   0.38870   
#> herb.layer         3.966   0.00332 **
#> reflection         3.641   0.00332 **
#> --- 
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
#> 
#> Test statistic:  11.29, p-value: 0.00332 
#> Arguments:
#>  Test statistics calculated assuming response assumed to be uncorrelated 
#>  P-value calculated using 300 resampling iterations via monte.carlo resampling (to account for correlation in testing).
#>