Chapter 13

These exercises will push you a bit to fit GLMs to various kinds of data. You’ll need to pay attention to the nature of the predictors and think about the response variables and their likely distributions.

A. Improving biological control of diamondback moths

Uefune et al. (2020) studied the use of synthetic herbivory-induced plant volatiles (HIPVs) to attract larval parasitoid wasps (Plutella xylostella) to control diamondback moths (DBM: Cotesia vestalis), a global pest of cruciferous vegetables in greenhouses growing mizuna (Japanese mustard) in Japan. They used two groups of greenhouses, the treated group having dispensers for the HIPVs as well as honeyfeeders to attract C. vestalis) and a second untreated group. In each greenhouse, a single sticky trap, replaced weekly over 6 months, was used to catch both DBMs and C. vestalis and the numbers of both counted. While greenhouse ID could have been included as a random effect in a mixed model analysis, the available data did not record individual greenhouses. We will model numbers of C. vestalis against numbers of DBM and treatment using each trap as the units of analysis. The study was done in 2006 and 2008 but we only analyze the 2008 data.

The data are available on Dryad, and you’ll want the Excel file Fig3.xls, and within it, the sheet labelled 2008. Before using it in analysis, you’ll need to tidy up the first few rows. Alternatively, we provide a simplified version, with three columns, treatment, moth, and parasitoid (uefune.csv).

What distribution would you expect the response variable parasitoid to follow?

Import the data and check the properties of the response variable.

#Import uefune data file (uefune.csv)
uefune <- read.csv("../data/uefunex.csv")
head(uefune,10)

##    treatment moth parasitoid
## 1    treated    0          0
## 2    treated    0          0
## 3    treated    0          0
## 4    treated    0          0
## 5    treated    0          0
## 6    treated    0          0
## 7    treated    0          1
## 8    treated    0          0
## 9    treated    0          0
## 10   treated    0          0

Hint: Try graphical presentation (possibly including a transformation) and look at summary statistics

What do you conclude about the response variable?

Fit a poisson glm and interpret the results

You should have decided that the data are too skewed to fit a linear model assuming a normal distribution, and a poisson is a strong candidate.

Extension question

A look at the means and standard deviations suggests that the variance was much larger than the mean, suggesting some overdispersion. Let’s check for it, and see if we should be concerned.

Hint: here’s an R code chunk to do that:

presid <- resid(uefune.glm, type=“pearson”)

ssize <- nrow(uefune1)

params <- length(coef(uefune.glm))

disp <- sum(presid^2)/(ssize-params)

disp

Conclusions?

How does your solution work?

For interest(!). you could also compare the results to the “old” way of transforming response variables closer to normality and fitting an OLS linear model

Try linear with sqrt transform

B. Better coffee-growing for small mammals

Caudill and Rice (2016) examined the effectiveness of methods for making agriculture and habitat preservation more compatible, specifically assessing whether protocols for “Biodiversity-Friendly” coffee growing resulted in positive outcomes for mammals. They compared four habitats, forest, Bird Friendly® shade, conventional shade, and sun coffee, and used a combination of Sherman traps and camera traps to count mammals and assess species richness. Each habitat was represented by multiple sites, with 23 total sites monitored. At each site, they also recorded a series of plant habitat variables, such as cover at canopy, mid- and lower-strata, and ground level, tree basal area. They were interested primarily in differences between habitats, but also the role of vegetation characteristics in influencing mammal diversity. We focus on this latter relationship.

The data were provided as an Excel file through Dryad, which needs a little tidying to use in R; two additional header rows, and some variable names quite long. Use the tidied caudill.csv file for convenience.

Potential explanatory habitat variables are

% canopy cover,
% mid-strata vegetation,
% lower strata vegetation,
% ground cover,
tree basal area (m2/ha),
tree density (number of trees/m2),
tree richness,
tree height (m), and
coffee height (m)

Caudill & Rice examined three response variables, Small Mammal “species density”, which was the number of small mammal species recorded, medium-large density, and total species density, which combined small and med-large.

Import the data file and have a look at it.

df <- read.csv("../data/caudill.csv")
head(df,10)

##    site habitat trapnights individuals specdens_small rrapnights_camera
## 1     1      BF        500           3              1                10
## 2     2      BF        500          14              4                10
## 3     3      BF        480           6              3                10
## 4     4      BF        480          13              4                10
## 5     5      BF        480          16              3                10
## 6     6      BF        500          21              5                10
## 7     7  Forest        500           6              2                10
## 8     8  Forest        500          16              3                10
## 9     9  Forest        340           2              2                10
## 10   10  Forest        480           4              1                10
##    num_images specdens_ml totspecdens canopy basalarea midstrata lowstrata
## 1           8           2           3   90.9      37.9      35.7      30.3
## 2          10           5           9   81.4      20.7      70.3      86.5
## 3           9           3           6   92.7      34.0      58.6      21.4
## 4          24           3           7   77.3      22.9      54.5      67.7
## 5           3           1           4   91.7      22.9      64.5      54.3
## 6           3           2           7   82.3      22.9      35.8      70.6
## 7           5           2           4   89.0      12.0      76.8      53.8
## 8           5           2           5   96.6      18.4      18.3      65.1
## 9          12           1           3   89.6      38.6      66.7      51.7
## 10          5           3           4   96.0      26.8      44.6      39.2
##    groundcov treecount elevation treerichness treeheight coffeeheight  X
## 1       62.2        84       723           10       20.4          2.5 NA
## 2        2.6        59       711           12       15.1          2.7 NA
## 3       53.4        80       713           13       11.7          2.8 NA
## 4        3.2        41       660            9       18.4          2.7 NA
## 5       65.9        48       641            5       12.9          2.7 NA
## 6        7.1        35       755            9       19.4          2.4 NA
## 7       28.4       171       487           NA         NA           NA NA
## 8        9.4       112       496           NA         NA           NA NA
## 9       24.3        35       650           NA         NA           NA NA
## 10      17.8       203       667           NA         NA           NA NA

#Forest habitat has no coffee trees, and low tree richness, so habitat variables not all present. Create subset of this data file
df2<-filter(df, habitat != 'Forest')

Look at the three response variables. What kind of distribution is most likely for them?

Would you analyse all three response variables? Why?

What checks would you do for the habitat predictor variables, and what do you conclude from those checks?

The authors showed some differences in mammal diversity between habitat types, but for their regression analysis they ignored habitat.

Are there any additional checks you’d do before proceeding?

What is the linear model for species density of small mammals?

Is there any concern about the number of predictors vs number of data points?

Fit this model to the data and assess its fit.

What do you conclude about habitat influences on small mammals?

Now run the analysis for large mammals

C. Why do gnus die?

This exercise expands on the example introduced in Chapter 13, where Sinclair and Arcese (1995) were interested in the causes (predation or other) of death of wildebeeste, more specifically whether predation varied with sex and health of animals. They addressed the question by examining carcasses, and they cross-classified 226 wildebeest carcasses from the Serengeti by three variables: sex (male, female), cause of death (predation, non-predation) and bone marrow type (SWF: solid white fatty; OG: opaque gelatinous; TG: translucent gelatinous; with the first indicating a healthy animal which is not undernourished).

We could look at these data in two ways; as we discussed in the chapter, it is sometimes difficult to identify a clear response variable in situations like this. At other times, one variable is a clear candidate.

The data were extracted from the original paper and used in our first edition. The data file is here.

#Import sinclair data file (sinclair.csv) 
sinclair <- read.csv("../data/sinclair.csv") 
head(sinclair,10)

##        death sex marrow gnus
## 1  Predation   F    SWF   26
## 2  Predation   F     OG   32
## 3  Predation   F     TG    8
## 4  Predation   M    SWF   14
## 5  Predation   M     OG   43
## 6  Predation   M     TG   10
## 7      Other   F    SWF    6
## 8      Other   F     OG   26
## 9      Other   F     TG   16
## 10     Other   M    SWF    7

factor(sinclair$sex)   #use these later when running glm

##  [1] F F F M M M F F F M M M
## Levels: F M

factor(sinclair$marrow)

##  [1] SWF OG  TG  SWF OG  TG  SWF OG  TG  SWF OG  TG 
## Levels: OG SWF TG

factor(sinclair$death)

##  [1] Predation Predation Predation Predation Predation Predation Other    
##  [8] Other     Other     Other     Other     Other    
## Levels: Other Predation

Hint for running analyses

If you’re working with R we’d recommend that you normally just run our small script libraries.R:

In this case, you’ll need to load two other packages, epitools and vcd

The 226 carcasses each could be placed into one of 12 cells of a 2 x 2 x 3 table, and we could model the cell count.
We could make the cause of death our focus, recording its death as predation or other, and record values for the other two values.

Write out the linear model you might use in each case

What distribution would you expect to response variable to follow in each case?

Start with situation a

What is your first step in fitting the model to these data?

What are your next steps?

What do you conclude about the lives of wildebeeste?

Now do situation b - fit logistic model with death as response

There are two kinds of data files that can be used in this case. We could work with a long file in which every gnu carcass is a record, or we could use a summary file, where one column of the file contains a cell count. This second format is what we used for fitting the log-linear model of part a, so we’ll stay with it.

To run these data with a binary response, we need to do two things:

Use a weights option, so the variable gnus in this case indicates that a record like this occurs several times, equal to the value of gnus for this record.
running a glm may require the death variable to be numerical, so we’ll define a new variable, pred with values 1 if Predation, 0 if Other. We can do that using ifelse.

Don’t do a full analysis here; just start by looking at the fit of the initial complex model, and see how your results compare to those you obtained in part a

D. Inbreeding effects on parasitism in birds

One of the consequences of inbreeding in animal populations is that it is thought to increase the susceptibility of individuals to disease. To assess this, Townsend et al. (2018) did genetic analysis of 178 crows found across California, and assessed their level of homozygosity (homozygosity by locus - HL), a measure of how inbred they were (higher levels of HL = more inbred). They also took blood samples from the same crows to check for the presence of avian malaria (Plasmodium), a common disease. They wanted to test whether higher amounts of inbreeding (higher HL) were associated with increased probability of having a Plasmodium infection.

The data file is here

#Import data file 
df <- read.csv("../data/townsend.csv")
head(df,10)

##    Plasmodium   HL
## 1           0 27.2
## 2           0 27.0
## 3           1 27.2
## 4           0 22.6
## 5           0 34.6
## 6           0 32.4
## 7           0 26.0
## 8           0 20.5
## 9           0 26.2
## 10          1 31.8

What kind of response variable do we have here?

Because we are doing a binomial response a simple scatterplot to check assumptions here is a little pointless, but just in case you do want to see it, have a crack

Fit the generalised linear model, Plasmodium = intercept + slope*HL, with a binomial response variable.

Think about how you might check how well the model fits

Examine the results from fitting the GLM and testing the null hypothesis that the slope of the response to predictor equals zero.

Determine and interpret the following:

y-intercept
slope of the regression
z value for main H₀
P-value for main H₀
Obtain the 95% confidence intervals on the model estimates

From the results above what conclusions would you draw about the relationship between incidence of avian malaria (Plasmodium) and the homozygosity level?

We can actually use this model to determine what the probability of a bird having avian malaria is if its homozygosity level is 50%

What is that probability?

Odds

We might want to something a bit more meaningful about the relationship of homozygosity level to incidence of Plasmodium and express this in terms of the odds of a bird having avian malaria. Remember that the binomial model uses a logit link function so that it models the response in terms of log odds (remember the odds are the probability of having Plasmodium divided by the probability of not having it). To compute how the odds of a bird having Plasmodium varies with homozygosity level in R, us exp(coef(townsend.glm)). You can also generate 95% confidence intervals on these odds ratios using exp(confint.default(townsend.glm))

What do you interpret from these results about the odds of a bird having Plasmodium as you increase the homozygosity level?

You might find it helpful to plot the data as well as examining the output

References

Caudill SA, Rice RA (2016) Do Bird Friendly® Coffee Criteria Benefit Mammals? Assessment of Mammal Diversity in Chiapas, Mexico. PLoS ONE 11(11): e0165662. doi:10.1371/journal.pone.0165662

Sinclair, A. R. E. & Arcese, P. (1995). Population consequences of predation-sensitive foraging: the Serengeti wildebeeste. Ecology, 76, 882-91. doi.org/10.2307/1939353

Townsend, A. K., Taff, C. C., Wheeler, S. S., Weis, A. M., Hinton, M. G., Jones, M. L., Logsdon, R. M., Reisen, W. K., Freund, D., Sehgal, R. N. M., Saberi, M., Suh, Y. H., Hurd, J., and Boyce, W. M.. 2018. Low heterozygosity is associated with vector-borne disease in crows. Ecosphere 9( 10):e02407. 10.1002/ecs2.2407

Uefune M, Abe J, Shiojiri K, Urano S, Nagasaka K, Takabayashi J. 2020 Targeting diamondback moths in greenhouses by attracting specific native parasitoids with herbivory-induced plant volatiles. R. Soc. Open Sci. 7: 201592. http://dx.doi.org/10.1098/rsos.201592

Caudill, S. Amanda, and Robert A. Rice. 2016. “Do Bird Friendly® Coffee Criteria Benefit Mammals? Assessment of Mammal Diversity in Chiapas, Mexico.” Edited by Tim A. Mousseau. PLOS ONE 11 (11): e0165662. https://doi.org/gsgchq.

Sinclair, A. R. E., and P. Arcese. 1995. “Population Consequences of Predation-Sensitive Foraging: The Serengeti Wildebeest.” Ecology 76 (3): 882–91. https://doi.org/fkr5bq.

Townsend, Andrea K., Conor C. Taff, Sarah S. Wheeler, Allison M. Weis, Mitch G. Hinton, Melissa L. Jones, Ryane M. Logsdon, et al. 2018. “Low Heterozygosity Is Associated with Vector‐borne Disease in Crows.” Ecosphere 9 (10). https://doi.org/gfnq8b.

Uefune, Masayoshi, Junichiro Abe, Kaori Shiojiri, Satoru Urano, Koukichi Nagasaka, and Junji Takabayashi. 2020. “Targeting Diamondback Moths in Greenhouses by Attracting Specific Native Parasitoids with Herbivory-Induced Plant Volatiles.” Royal Society Open Science 7 (11): 201592. https://doi.org/gsgcg9.