Fitting Linear Models for Multivariate Abundance Data

manylm is used to fit multivariate linear models to high-dimensional data, such as multivariate abundance data in ecology.

This is the base model-fitting function - see plot.manylm for assumption checking, and anova.manylm or summary.manylm for significance testing.

Usage

manylm(
   formula, data=NULL,  subset=NULL, weights=NULL, 
   na.action=options("na.action"),  method="qr", model=FALSE, 
   x=TRUE, y=TRUE, qr=TRUE, singular.ok=TRUE, contrasts=NULL, 
   offset, test="LR" , cor.type= "I", shrink.param=NULL, 
   tol=1.0e-5, ...)

Arguments

formula: an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under Details.
data: an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which manylm is called.
subset: an optional vector specifying a subset of observations to be used in the fitting process.
weights: an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-null, weighted least squares is used with weights weights (that is, minimizing sum(weights*e^2)); otherwise ordinary least squares is used.
na.action: a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.
method: the method to be used; for fitting, currently only method = "qr" is supported; method = "model.frame" returns the model frame (the same as with model = TRUE, see below).
model, x, y, qr: logicals. If TRUE the corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition) are returned.
singular.ok: logical. If FALSE (the default in S but not in R) a singular fit is an error.
contrasts: an optional list. See the contrasts.arg of model.matrix.default.
offset: this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length either one or equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if both are specified their sum is used. See model.offset.
test: choice of test statistic. Can be one of "LR" (default) = likelihood ratio statistic "F" = Lawley-Hotelling trace statistic NULL = no test This parameter is merely stored in manylm, and will be used as the default value of test in subsequent functions for inference.
cor.type: structure imposed on the estimated correlation matrix under the fitted model. Can be "I"(default), "shrink", or "R". See anova.manylm for details of its usage. This parameter will be used as the default value of cor.type in subsequent functions for inference.
shrink.param: shrinkage parameter to be used if cor.type="shrink". This parameter will be used as the default value of shrink.param in subsequent functions for inference.
tol: the tolerance used in estimations.
...: additional arguments to be passed to the low level regression fitting functions (see below).

Details

Models for manylm are specified symbolically. For details on this compare the details section of lm and formula. If the formula includes an offset, this is evaluated and subtracted from the response.
See model.matrix for some further details. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on: to avoid this pass a terms object as the formula (see aov and demo(glm.vr) for an example).
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula for more details of allowed formulae.
manylm calls the lower level function manylm.fit or manylm.wfit for the actual numerical computations. For programming only, you may consider doing likewise.
All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula.
For details on arguments related to hypothesis testing (such as cor.type and resample) see summary.manylm or anova.manylm.

Value

manylm returns an object of c("manylm", "mlm", "lm") for multivariate formula response and of of class c("lm") for univariate response.

A manylm object is a list containing at least the following components:

coefficients: a named matrix of coefficients
residuals: the residuals matrix, that is response minus fitted values.
fitted.values: the matrix of the fitted mean values.
rank: the numeric rank of the fitted linear model.

weights: (only for weighted fits) the specified weights.
df.residual: the residual degrees of freedom.
hat.X: the hat matrix.
txX: the matrix (t(x)%*%x).
test: the test argument supplied.
cor.type: the cor.type argument supplied.
resample: the resample argument supplied.
nBoot: the nBoot argument supplied.
call: the matched call.
terms: the terms object used.

xlevels: (only where relevant) a record of the levels of the factors used in fitting.
model: if requested (the default), the model frame used.
offset: the offset used (missing if none were used).
y: if requested, the response matrix used.
x: if requested, the model matrix used.

In addition, non-null fits will have components assign and (unless not requested) qr relating to the linear fit, for use by extractor functions such as summary.manylm.

Author

Yi Wang, Ulrike Naumann and David Warton <David.Warton@unsw.edu.au>.

Examples

data(spider)
spiddat <- log(spider$abund+1)
lm.spider <- manylm(spiddat~.,data=spider$x)
lm.spider
#> 
#> Call:
#> manylm(formula = spiddat ~ ., data = spider$x)
#> 
#> Coefficients:
#>                Alopacce   Alopcune   Alopfabr   Arctlute   Arctperi   Auloalbi 
#> (Intercept)     1.231267  -1.380086   2.393197  -1.141160   2.027234  -1.103879
#> soil.dry       -0.794355   0.749110  -0.879432   0.730493  -0.594573   0.571479
#> bare.sand      -0.134612  -0.169884   0.255664   0.159457   0.089462   0.042262
#> fallen.leaves   0.099633  -0.051408   0.043434  -0.236105   0.047585  -0.228302
#> moss            0.098816  -0.211830  -0.018714  -0.042750  -0.150472  -0.231740
#> herb.layer      0.214865   0.147603   0.039480   0.050233  -0.166885   0.467675
#> reflection      0.459053   0.388217   0.046706  -0.096538   0.217172  -0.045856
#>                Pardlugu   Pardmont   Pardnigr   Pardpull   Trocterr   Zoraspin 
#> (Intercept)     3.734407  -3.407564  -0.469123  -1.859186   1.441566   0.077075
#> soil.dry       -0.761781   1.256562   1.299666   1.562122   0.891209   1.034457
#> bare.sand      -0.229075   0.009099   0.127535  -0.110245  -0.155567   0.283460
#> fallen.leaves   0.232496  -0.308228  -0.700012  -0.691232  -0.241817  -0.479958
#> moss           -0.200612   0.589305  -0.393534  -0.197432  -0.385430  -0.134735
#> herb.layer      0.050538   0.290093   0.281785   0.352392   0.291045   0.263206
#> reflection     -0.319179   0.153976  -0.239496  -0.026756  -0.199446  -0.656028
#> 

#Then use the plot function for diagnostic plots, and use anova or summary to
#evaluate significance of different model terms.