Fill et al (2021) studied the effect of duff (leaf litter) on the post-fire ecology of wiregrass (Aristida beyrichiana) in a section of pine savanna. They sampled 99 plants in an area of 0.1 km2, recorded plant basal area and allocated each plant to one of three treatments: high duff, low duff, low duff with added pine cones. They then burnt the area and five months later, counted the number of culms on each plant. We will model numbers of culms per plant against basal area and duff treatment using each plant as the unit of analysis.
This box explores several different ways in which count data can be treated, from the “classical” OLS models where the counts are square-root transformed, to GLMs based on the poisson distribution, to several options for data that are overdispersed.
Aristida beyrichiana. © Copyright Bobby Hattaway 2011/Discover Life
The paper is here and Figure 1 has relevant photos.
Fill, J. M., Zamora, C., Baruzzi, C., Salazar-Castro, J. & Crandall, R. M. (2021). Wiregrass (Aristida beyrichiana) survival and reproduction after fire in a long-unburned pine savanna. PLoS One, 16, e0247159.
First, load the required packages (car, performance, MuMIn, statmod, lmtest, vcd, Rmisc, MASS, pscl)
Import fill data file (fill.csv)
fill <- read.csv("../data/fill.csv")
fill
Check culm distribution with histogram
plot(table(fill$culm))
Boxplot against treatment
boxplot(culm~treatment, data=fill)
Get mean and variance by year and treatment, and look for mean vs variance relationship.
Data possibly over-dispersed for poisson
summarySE(data=fill, measurevar="culm", groupvars=c("treatment"))
Data possibly over-dispersed for poisson, but not bad.
fill1.glm <- glm(culm~basalarea+treatment+basalarea*treatment, family=poisson, data=fill)
summary(fill1.glm)
Call:
glm(formula = culm ~ basalarea + treatment + basalarea * treatment,
family = poisson, data = fill)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.930 -3.502 -2.471 1.056 12.218
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.0040614 0.0870699 23.017 < 2e-16 ***
basalarea 0.0000407 0.0003528 0.115 0.90815
treatmentLow Duff 0.2112408 0.1010590 2.090 0.03659 *
treatmentLow Duff + Pinecones -0.4384083 0.1330034 -3.296 0.00098 ***
basalarea:treatmentLow Duff 0.0015309 0.0003625 4.223 2.41e-05 ***
basalarea:treatmentLow Duff + Pinecones 0.0038881 0.0004487 8.665 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2101.6 on 98 degrees of freedom
Residual deviance: 1599.9 on 93 degrees of freedom
AIC: 1891.3
Number of Fisher Scoring iterations: 6
AICc(fill1.glm)
[1] 1892.183
Check deviance residuals, influential values and collinearity
residualPlots(fill1.glm, type="deviance")
Test stat Pr(>|Test stat|)
basalarea 247.27 < 2.2e-16 ***
treatment
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
residuals(fill1.glm, type="deviance")
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0.20923286 3.52437571 -1.84696896 0.55169766 9.08063242 2.43049570 -1.38139530 -3.85269153 -2.97516095 -2.99178806 -2.98053360 -3.03476511 -3.85618268 -3.03090905 4.05076046 -2.97604160
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
-2.99110558 4.30805157 -2.36631486 -0.94501752 3.54431734 0.89789475 -2.97441165 -1.85136763 -1.85196037 -0.94785624 0.17193562 -3.01860601 5.48494095 1.16145876 -4.34220066 -5.25716243
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
-4.32602610 -3.49888536 -3.49517373 -4.29319540 -2.41941660 -3.47184250 -4.38810960 -4.28248340 -4.77859577 -4.28182304 6.97214474 -4.90583052 10.66755976 4.45929577 4.44436791 12.21787559
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
8.94763007 4.86958756 -5.92962494 -3.40832404 -1.49205900 -4.36327765 -1.12152934 -3.46773903 -4.46042903 1.56370632 -2.43527977 -4.42660331 -3.87713567 -3.71109682 2.08390937 0.95140425
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
-1.98548186 -3.51762919 4.11500950 -1.98220634 -4.32655983 -3.71063761 -3.53833655 -5.30155193 -0.63014935 -2.39743624 8.58300081 -3.39841806 -3.78053165 -0.09213929 1.54731185 -3.14245871
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
4.99032310 -3.21638933 2.20660594 -1.49845051 7.07694236 -2.71701487 -3.15255411 -4.74932844 2.02824210 -2.47070604 -2.55251080 2.63055203 -3.32358182 -2.13220538 -2.56631805 -3.12496585
97 98 99
-1.71855808 -3.20060647 -3.50413976
influence.measures(fill1.glm)
Influence measures of
glm(formula = culm ~ basalarea + treatment + basalarea * treatment, family = poisson, data = fill) :
influencePlot(fill1.glm)
vif(lm(culm ~ basalarea+treatment+basalarea*treatment, data=fill))
there are higher-order terms (interactions) in this model
consider setting type = 'predictor'; see ?vif
GVIF Df GVIF^(1/(2*Df))
basalarea 4.832804 1 2.198364
treatment 2.439885 2 1.249805
basalarea:treatment 8.588810 2 1.711919
Check collinearity - OK
fill2.glm <- glm(culm~basalarea+treatment, family=poisson, data=fill)
lrtest(fill1.glm, fill2.glm)
Likelihood ratio test
Model 1: culm ~ basalarea + treatment + basalarea * treatment
Model 2: culm ~ basalarea + treatment
#Df LogLik Df Chisq Pr(>Chisq)
1 6 -939.64
2 4 -984.50 -2 89.733 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Get analysis of deviance table
anova(fill1.glm, test="LRT")
Analysis of Deviance Table
Model: poisson, link: log
Response: culm
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 98 2101.6
basalarea 1 372.99 97 1728.6 < 2.2e-16 ***
treatment 2 39.02 95 1689.6 3.371e-09 ***
basalarea:treatment 2 89.73 93 1599.8 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
presid1 <- resid(fill1.glm, type="pearson")
ssize1 <- nrow(fill)
params1 <- length(coef(fill1.glm))
disp1 <- sum(presid1^2)/(ssize1-params1)
disp1
[1] 21.12057
Note residual deviance >> residual df and dispersion stat >> 1 so overdispersed. Might be better with negative binomial rather than poisson.
fill1.nb <- glm.nb(culm~basalarea+treatment+basalarea*treatment, data=fill)
summary(fill1.nb)
Call:
glm.nb(formula = culm ~ basalarea + treatment + basalarea * treatment,
data = fill, init.theta = 0.4727266654, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8019 -1.1184 -0.7019 0.1692 2.0097
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.999e+00 3.564e-01 5.609 2.04e-08 ***
basalarea 7.592e-05 1.455e-03 0.052 0.9584
treatmentLow Duff -1.606e-01 4.408e-01 -0.364 0.7155
treatmentLow Duff + Pinecones -7.506e-01 5.325e-01 -1.410 0.1587
basalarea:treatmentLow Duff 3.038e-03 1.658e-03 1.833 0.0668 .
basalarea:treatmentLow Duff + Pinecones 5.557e-03 2.347e-03 2.368 0.0179 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.4727) family taken to be 1)
Null deviance: 132.36 on 98 degrees of freedom
Residual deviance: 110.35 on 93 degrees of freedom
AIC: 627.85
Number of Fisher Scoring iterations: 1
Theta: 0.4727
Std. Err.: 0.0722
2 x log-likelihood: -613.8500
AICc(fill1.nb)
[1] 629.0806
fill2.nb <- glm.nb(culm~basalarea+treatment, data=fill)
lrtest(fill1.nb, fill2.nb)
Likelihood ratio test
Model 1: culm ~ basalarea + treatment + basalarea * treatment
Model 2: culm ~ basalarea + treatment
#Df LogLik Df Chisq Pr(>Chisq)
1 7 -306.93
2 5 -308.54 -2 3.2376 0.1981
summary(fill2.nb)
Call:
glm.nb(formula = culm ~ basalarea + treatment, data = fill, init.theta = 0.456240813,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7782 -1.3744 -0.6591 0.2705 1.9715
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6237482 0.3057921 5.310 1.10e-07 ***
basalarea 0.0030786 0.0006616 4.653 3.26e-06 ***
treatmentLow Duff 0.2203827 0.3654880 0.603 0.547
treatmentLow Duff + Pinecones 0.0282148 0.4203121 0.067 0.946
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.4562) family taken to be 1)
Null deviance: 128.58 on 98 degrees of freedom
Residual deviance: 110.49 on 95 degrees of freedom
AIC: 627.09
Number of Fisher Scoring iterations: 1
Theta: 0.4562
Std. Err.: 0.0691
2 x log-likelihood: -617.0870
AICc(fill2.nb)
[1] 627.7326
fill3.nb <- glm.nb(culm~basalarea, data=fill)
summary(fill3.nb)
Call:
glm.nb(formula = culm ~ basalarea, data = fill, init.theta = 0.4538768096,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7504 -1.3788 -0.7063 0.2963 2.1243
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.7282676 0.1855371 9.315 <2e-16 ***
basalarea 0.0031460 0.0006618 4.754 2e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.4539) family taken to be 1)
Null deviance: 128.03 on 98 degrees of freedom
Residual deviance: 110.49 on 97 degrees of freedom
AIC: 623.54
Number of Fisher Scoring iterations: 1
Theta: 0.4539
Std. Err.: 0.0686
2 x log-likelihood: -617.5430
AICc(fill3.nb)
[1] 623.7958
lrtest(fill2.nb, fill3.nb)
Likelihood ratio test
Model 1: culm ~ basalarea + treatment
Model 2: culm ~ basalarea
#Df LogLik Df Chisq Pr(>Chisq)
1 5 -308.54
2 3 -308.77 -2 0.4557 0.7962
fill4.nb <- glm.nb(culm~treatment, data=fill)
lrtest(fill2.nb, fill4.nb)
Likelihood ratio test
Model 1: culm ~ basalarea + treatment
Model 2: culm ~ treatment
#Df LogLik Df Chisq Pr(>Chisq)
1 5 -308.54
2 4 -315.69 -1 14.296 0.0001562 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
anova(fill1.nb, test="LRT")
Warning: tests made without re-estimating 'theta'
Analysis of Deviance Table
Model: Negative Binomial(0.4727), link: log
Response: culm
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 98 132.36
basalarea 1 18.2374 97 114.12 1.95e-05 ***
treatment 2 0.4734 95 113.64 0.7892
basalarea:treatment 2 3.2929 93 110.35 0.1927
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Check diagnostics
residualPlots(fill1.nb, type="deviance")
Test stat Pr(>|Test stat|)
basalarea 8.8463 0.002937 **
treatment
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
influence.measures(fill1.nb)
Influence measures of
glm.nb(formula = culm ~ basalarea + treatment + basalarea * treatment, data = fill, init.theta = 0.4727266654, link = log) :
influencePlot(fill1.nb)
Dispersion statistic (note add 1 to parameters because of estimating k)
nbresid <- resid(fill1.nb, type="pearson")
nbssize <- nrow(fill)
nbparams <- length(coef(fill1.nb))+1
nbdisp <- sum(nbresid^2)/(nbssize-nbparams)
nbdisp
[1] 1.0892
Dispersion statistic just over 1 so much better.
fill.qp <- glm(culm~basalarea+treatment+basalarea*treatment, family=quasipoisson, data=fill)
fill.qp$theta
NULL
summary(fill.qp)
Call:
glm(formula = culm ~ basalarea + treatment + basalarea * treatment,
family = quasipoisson, data = fill)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.930 -3.502 -2.471 1.056 12.218
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.0040614 0.4001547 5.008 2.6e-06 ***
basalarea 0.0000407 0.0016212 0.025 0.9800
treatmentLow Duff 0.2112408 0.4644455 0.455 0.6503
treatmentLow Duff + Pinecones -0.4384083 0.6112549 -0.717 0.4750
basalarea:treatmentLow Duff 0.0015309 0.0016659 0.919 0.3605
basalarea:treatmentLow Duff + Pinecones 0.0038881 0.0020622 1.885 0.0625 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 21.12124)
Null deviance: 2101.6 on 98 degrees of freedom
Residual deviance: 1599.9 on 93 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 6
anova(fill.qp, test="LRT")
Analysis of Deviance Table
Model: quasipoisson, link: log
Response: culm
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 98 2101.6
basalarea 1 372.99 97 1728.6 2.642e-05 ***
treatment 2 39.02 95 1689.6 0.3971
basalarea:treatment 2 89.73 93 1599.8 0.1195
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
residualPlots(fill.qp, type="deviance")
Test stat Pr(>|Test stat|)
basalarea 247.27 < 2.2e-16 ***
treatment
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
influence.measures(fill.qp)
Influence measures of
glm(formula = culm ~ basalarea + treatment + basalarea * treatment, family = quasipoisson, data = fill) :
influencePlot(fill.qp)
Check predicted number of zeros from -ve binomial first observed number (ignoring treatments)
zerobs <- fill$culm == 0
propzerobs <- sum(zerobs)/nrow(fill)
propzerobs
[1] 0.2222222
Second, predicted number from -ve binomial
munb <- exp(predict(fill2.nb))
theta <- fill2.nb$theta
znb <- dnbinom(0, mu=munb, size=theta)
mean(znb)
[1] 0.2588404
fill1.zip <- zeroinfl(culm~basalarea+treatment|1, data=fill, dist="negbin")
summary(fill1.zip)
Call:
zeroinfl(formula = culm ~ basalarea + treatment | 1, data = fill, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.6648 -0.6423 -0.4645 0.3081 4.2149
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6237408 0.3050987 5.322 1.03e-07 ***
basalarea 0.0030787 0.0009237 3.333 0.000859 ***
treatmentLow Duff 0.2203466 0.3709882 0.594 0.552549
treatmentLow Duff + Pinecones 0.0282734 0.4263062 0.066 0.947122
Log(theta) -0.7847333 0.1513783 -5.184 2.17e-07 ***
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.33 37.93 -0.272 0.785
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta = 0.4562
Number of iterations in BFGS optimization: 27
Log-likelihood: -308.5 on 6 Df
AIC(fill1.zip)
[1] 629.0888
fill2.zip <- zeroinfl(culm~basalarea+treatment|basalarea+treatment, data=fill, dist="negbin")
summary(fill2.zip)
Call:
zeroinfl(formula = culm ~ basalarea + treatment | basalarea + treatment, data = fill, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.7413 -0.6068 -0.4402 0.2948 3.8557
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6988429 0.2759555 6.156 7.45e-10 ***
basalarea 0.0023910 0.0008009 2.985 0.00283 **
treatmentLow Duff 0.3839040 0.3505127 1.095 0.27340
treatmentLow Duff + Pinecones 0.3399303 0.4220126 0.805 0.42053
Log(theta) -0.5612528 0.1732613 -3.239 0.00120 **
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.41990 100.79578 -0.103 0.918
basalarea -0.03281 0.02478 -1.325 0.185
treatmentLow Duff 9.61626 100.79708 0.095 0.924
treatmentLow Duff + Pinecones 10.86463 100.79652 0.108 0.914
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta = 0.5705
Number of iterations in BFGS optimization: 39
Log-likelihood: -305.1 on 9 Df
AIC(fill2.zip)
[1] 628.2313
lrtest(fill1.zip, fill2.zip)
Likelihood ratio test
Model 1: culm ~ basalarea + treatment | 1
Model 2: culm ~ basalarea + treatment | basalarea + treatment
#Df LogLik Df Chisq Pr(>Chisq)
1 6 -308.54
2 9 -305.12 3 6.8576 0.07658 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
fill.lm <- lm(culm~basalarea+treatment+basalarea*treatment, data=fill)
summary(fill.lm)
Call:
lm(formula = culm ~ basalarea + treatment + basalarea * treatment,
data = fill)
Residuals:
Min 1Q Median 3Q Max
-29.845 -7.205 -4.438 2.682 64.961
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.4189045 3.6380246 2.039 0.0443 *
basalarea 0.0003059 0.0148651 0.021 0.9836
treatmentLow Duff -0.5594368 4.5020291 -0.124 0.9014
treatmentLow Duff + Pinecones -5.2046325 5.3840301 -0.967 0.3362
basalarea:treatmentLow Duff 0.0407534 0.0170275 2.393 0.0187 *
basalarea:treatmentLow Duff + Pinecones 0.0626570 0.0241614 2.593 0.0110 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.31 on 93 degrees of freedom
Multiple R-squared: 0.292, Adjusted R-squared: 0.2539
F-statistic: 7.669 on 5 and 93 DF, p-value: 4.502e-06
plot(fill.lm)
AIC(fill.lm)
[1] 829.0305
fill.lm <- lm(sqrt(culm)~basalarea+treatment+basalarea*treatment, data=fill)
summary(fill.lm)
Call:
lm(formula = sqrt(culm) ~ basalarea + treatment + basalarea *
treatment, data = fill)
Residuals:
Min 1Q Median 3Q Max
-3.0962 -1.2761 -0.7173 1.1725 5.7084
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3126014 0.4828446 4.790 6.3e-06 ***
basalarea -0.0005668 0.0019729 -0.287 0.77452
treatmentLow Duff -0.6210459 0.5975168 -1.039 0.30132
treatmentLow Duff + Pinecones -1.3296171 0.7145774 -1.861 0.06595 .
basalarea:treatmentLow Duff 0.0057803 0.0022599 2.558 0.01215 *
basalarea:treatmentLow Duff + Pinecones 0.0101563 0.0032067 3.167 0.00208 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.032 on 93 degrees of freedom
Multiple R-squared: 0.2872, Adjusted R-squared: 0.2489
F-statistic: 7.495 on 5 and 93 DF, p-value: 5.999e-06
plot(fill.lm)
AIC(fill.lm)
[1] 429.1693