Chapter 10

The aim of these exercises is to start getting familiar with mixed models, and in these exercises we’ll be focused on models with multiple categorical predictors, with at least one predictor a random effect. The random effects can be crossed or be nested within other predictors, creating a nested or multilevel structure.

For each of the examples below, you should follow the sequence we’ve used previously:

What is the biological question and what are the response and predictor variables?
What distribution do you expect the response variable to follow?
Is each predictor
1. Continuous or categorical?
2. Fixed or random?
Write out the linear model corresponding to the biological question.
What are the assumptions behind the statistical model you’ll fit?
1. Are those assumptions satisfied?
Fit the model
1. How will you assess whether the model fits well?
2. Can you detect an effect of the fixed predictors?
3. How much of the variance is attributable to each random predictor?
What do you conclude (including any cautions)

Things to look out for

In fitting these models, we’ll need to make the initial checklist more complex:

Think about the relationship between predictors - are they crossed (factorial) or nested (hierarchical)? Make sure that your linear model reflects these relationships!
When there are random effects, assumptions can be more extensive. We’re usually interested in the fixed effects, and different fixed effects can have different assumptions. This is particularly the case with nested designs.

Crossed mixed model designs

Remember from Chapter 10 that these designs are typically used for two purposes.

Random effects are often used to reduce background noise in the data (e.g. randomized blocks designs), where the random effect is of little intrinsic interest. In a sense, we sacrifice degrees of freedom from the residual variance, anticipating that the variance attributed to the random effects will be large enough to make this sacrifice worthwhile. When you see these designs, it’s a good idea to look and see if the use of a random effect seems a good decision, e.g. by looking at blocking efficiency.

In other cases, the random effects are used to estimate how consistently fixed effects act, by estimating variance in their effects across, for example, an environmental spectrum (fixed predictor).

A. Leaf domatia and mites

Walter and O’Dowd (1992) were interested in testing the hypothesis that leaves of the shrub Viburnum tinus with domatia (small shelters at the juncture of veins on leaves) have more mites than leaves without domatia. Fourteen paired leaves on a shrub of V. tinus were randomly chosen and one leaf in each pair had its domatia shaved while the other remained as a control; the number of mites was recorded on each leaf (experimental unit) after two weeks.

The data file is here: walter.csv.

df <- read.csv("../data/walter.csv")
df

##    leaf pair domatia mites
## 1    a1    a  intact     9
## 2    a2    a  shaved     1
## 3    b1    b  intact     2
## 4    b2    b  shaved     1
## 5    c1    c  intact     0
## 6    c2    c  shaved     2
## 7    d1    d  intact    12
## 8    d2    d  shaved     4
## 9    e1    e  intact    15
## 10   e2    e  shaved     2
## 11   f1    f  intact     3
## 12   f2    f  shaved     1
## 13   g1    g  intact    11
## 14   g2    g  shaved     0
## 15   h1    h  intact     6
## 16   h2    h  shaved     0
## 17   i1    i  intact     7
## 18   i2    i  shaved     1
## 19   j1    j  intact     6
## 20   j2    j  shaved     0
## 21   k1    k  intact     5
## 22   k2    k  shaved     1
## 23   l1    l  intact     8
## 24   l2    l  shaved     1
## 25   m1    m  intact     3
## 26   m2    m  shaved     1
## 27   n1    n  intact     6
## 28   n2    n  shaved     0

This file has four variables:

leaf is just a code to identify individual leaves
pair identifies the pair to which that leaf belongs
domatia describes the experimental treatments, whether domatia were shaved off or left intact
mites is the number of mites recorded from that leaf

Use the sequence at the start of these exercises to make some conclusions about the role of domatia on this shrub species and the variability between leaf pairs.

Focus for now on using an OLS approach, so think about the response variable and how you might treat it.

Why do you think the authors chose to run the experiment with pairs of leaves?

Did that pairing decision improve their ability to say something about domatia?

Extra questions:

Are there alternatives to how you treated the response variable in your analysis.

Hint: there are at least a couple

Would your conclusions have changed?

B. Rainfall intensity and frequency and grassland plants

Didiano, Johnson, and Duval (2016) assessed the effects of rainfall intensity and frequency on performance of grassland plants in Ontario, attempting to understand potential effects of future climates. The area is projected to receive much more summer rainfall than at present. They established a field experiment in which plants were grown from seeds in pots subjected to one of two rainfall levels (70 and 90 mm per month) and three frequencies (3, 5, and 15 days per month). They were interested in whether responses of plants varied according to whether the plants were monocots (5 species, though only 4 grew successfully) or eudicots (10, species).

The experiment was done using shelters (see their figure S2) that prevented natural rainfall, allowing the researchers to control water received by the plants. They used 10 shelters, each of which had 6 groups of 18 pots (see their figure S1). Each group received one of the 6 frequency/amount combinations. The 18 pots in each group had one of the 15 target species, with three pots used to measure soil moisture levels during the experiment.

For each seedling, they recorded leaf number (which didn’t change much), plant height, and above- and below-ground biomass at the end of the experiment. They also calculated a ratio of above:below biomass.

The data are available from dryad; the data you’ll use are the first sheet in the Excel file, which you can import directly, or use Excel to save that first sheet as a csv file. To make things a little complicated, the data file uses only common names, while the paper uses formal scientific names. We’ve made our own version of the file with species names and monocot/eudicot added - it’s didiano.csv

Let’s keep things simple by focusing on one response variable (above.ground.biomass) and one species (Big bluestem)

didiano <- read.csv("../data/didiano.csv")
#Now create subset using just one species 
df<-subset(didiano, Species=="Big bluestem")
head(df,10)

##    Plant.ID Shelter Plot Pot      Species Amount Frequency
## 1        43       1    1  10 Big bluestem     70         3
## 2        56       1    2  26 Big bluestem     70        15
## 3        42       1    3  43 Big bluestem     70        30
## 4        18       1    4  61 Big bluestem     90         3
## 5        81       1    5  87 Big bluestem     90        15
## 6        80       1    6 102 Big bluestem     90        30
## 7       167       2    7 117 Big bluestem     70         3
## 8       155       2    8 133 Big bluestem     70        15
## 9       181       2    9 157 Big bluestem     70        30
## 10      106       2   10 160 Big bluestem     90         3
##    X1st.leaf.measurement X1st.plant.height.measurement X2nd.leaf.measurement
## 1                     21                          18.5                    15
## 2                     14                          14.0                    18
## 3                     28                          21.0                    40
## 4                      9                          18.0                    23
## 5                     28                          18.5                    32
## 6                     17                          18.0                    24
## 7                     22                          20.0                    21
## 8                     43                          20.0                    33
## 9                     20                          19.0                    34
## 10                    22                          17.0                    49
##    X2nd.plant.height.measurement Above.ground.biomass Below.ground.biomass
## 1                           51.0               3.2582              5.11740
## 2                           44.0               3.5384              5.13360
## 3                          122.5               9.9042             10.92450
## 4                           59.0               4.5037              7.07450
## 5                          103.0               7.5734             12.62010
## 6                          101.5               6.6888             18.50870
## 7                           74.0               5.4688             13.18560
## 8                           61.0               9.5101             23.75450
## 9                           58.0               4.9502              1.07738
## 10                          81.0              10.0241             26.37970
##    Above..to.below.ground.ratio    Sp_name    Type
## 1                   15.65281444 Andropogon monocot
## 2                   12.43499887 Andropogon monocot
## 3                   12.36849014 Andropogon monocot
## 4                   13.10033972 Andropogon monocot
## 5                   13.60023239 Andropogon monocot
## 6                   15.17462026 Andropogon monocot
## 7                   13.53130486 Andropogon monocot
## 8                   6.414233289 Andropogon monocot
## 9                   11.71669832 Andropogon monocot
## 10                  8.080525933 Andropogon monocot

Use the sequence at the start of these exercises to make some conclusions about the role of rainfall amount and frequency on this species and the variability between shelters.

Focus for now on using an OLS approach, so think about the response variable how you might treat it.

Now use REML instead

Why do you think the authors chose to lay out the experiment with 10 shelters?

Did that design decision improve their ability to say something about the rainfall effects?

Simplified model

The appropriate model for full RCB has Amount x Shelter and Frequency x Shelter. Didiano et al. apparently did not include these terms.

Redo the earlier analysis using their simpler model as starting point

Extension activities

For the species we’ve chosen, did rainfall have the same effect on plant height (2nd.plant.height.measurement) and below-ground biomass?
Was there a difference in how monocot and eudicot species responded? They used 14 species in their experiment, with the aim of comparing responses of monocot and eudicot species?
- The monocot species (all Poaceae) were Andropogon gerardii (Big bluestem) , Elymus canadensisa (Canada wild rye), Elymus riparius (Riverbank rye), Panicum virgatum, and Sorghastrum nutans (Indian grass), but S. nutans failed to grow.
- The eudicots were Asclepias tuberosa (butterflyweed), Desmodium canadense (Showy tick-trefoil), Eupatorium perfoliatum (Common boneset), Euphorbia corollata (flowering spurge), Geum triflorum (Prairie smoke), Oenothera biennis (Common evening primrose), Penstemon hirsutus (Hairy beardtongue), Solidago nemoralis (Gray goldenrod), Verbena stricta (Hoary vervain), and Zizia aurea (Golden alexanders), which were spread over 9 families.
- Hint: you might want to simplify this question by focusing just on main effects, particularly of rainfall amount
- If you start working through more analyses, you’ll get some experience working with missing values as well for some species! **Note that the simplified model isn’t affected to same degree by occasional missed plants

As an example, run the analysis and interpret it using the data for Zizia

C. Psychostimulant effects on fruit flies

Highfill et al. (2019) examined effects of two psychostimulants of public health concern (cocaine and methamphetamine) using Drosophila melanogaster as a “translational” model species. Translational reflected similarities in dopamine transport and the overall effects of exposure in fruitflies and humans. Their interest was in variation in susceptibility to these compounds, so they examined responses of lines of flies chosen from a large set of inbred lines (the Genetic Reference Panel). Groups of 5 flies from a line were placed into test vials and their consumption of sucrose media and sucrose + drug media recorded after 18 h. There’s a nice diagram of the test vial here. Please look at it before proceeding.

It’s not uncommon to find sex-based differences in consumption, so they used groups of males and groups of female flies for each line. Combinations were replicated 10 times.

The final complexity was that they were interested in whether drug sensitivity changed with time, so they allowed 6 hours for flies to feed, then ran another trial, followed by more food, and a third.

What is this design?

As you work your way through the analysis sequence described at the start of these exercises, there are two important questions for you to answer. This is a complex example, so start with a factor diagram (see, e.g. Figure 10.2). It will help to clarify the relationships between the different factors. Look at the data file to see how they’re named. You can download it here

library(readxl)
highfill <- read_excel("../data/pgen.1007834.s001.xlsx")
highfill <- highfill %>%
  dplyr::rename(Line = "DGRP Line") %>%
  dplyr::rename(vial = Replicate)
head(highfill,10)

## # A tibble: 10 × 6
##    Solution Sex    Line  vial Consumption Exposure
##    <chr>    <chr> <dbl> <dbl>       <dbl>    <dbl>
##  1 Cocaine  F        41     1        34.6        1
##  2 Cocaine  F        41     2        23.6        1
##  3 Cocaine  F        41     3        22.6        1
##  4 Cocaine  F        41     4        38.6        1
##  5 Cocaine  F        41     5        11.6        1
##  6 Cocaine  F        41     6        37.6        1
##  7 Cocaine  F        41     7        54.6        1
##  8 Cocaine  F        41     8        13.6        1
##  9 Cocaine  F        41     9        59.6        1
## 10 Cocaine  F        41    10        81.6        1

How many factors are there in the design?

Hint: it’s more than 3!

What model corresponds to the design?

Factorial mixed model, with 3 fixed effects (Solution, Sex, Exposure) and one random effect (Line)
Partly-nested mixed model, with vials as plots or subjects. Sex, Exposure and Line as between-plot effects and Drug Effect (the pairs of capillary tubes) as the within-plots effect
Partly-nested mixed model, with vials as plots or subjects. Sex and Line as between-plot effects and Drug Effect and Exposure as within-plots effect.

Time for some analysis

We’ll approach the analysis of this data set in a couple of ways - simple and complex. The complex way is an extension exercise for Chapter 12. The keenest of you might like to take on the challenge of specifying and running this model. Do it after you’ve finished the Chapter 12 exercises, and with reference to Box 12.5.

Ways to simplify things

We can also make the main question (variation in sensitivity to a psychostimulant) more tractable. The stimulant effect is the difference between sucrose and the drug solution, so we could calculate that for each vial.

We still have the question of what to do with the three trials (Exposure).

-   You could jump ahead to Chapter 12 and see whether sensitivity varies with time, and whether that variation varies with line

-   Use the three trials as just a way of estimating sensitivity, i.e. average them

-   Choose one of the trials as "the" endpoint, e.g. Trial 3

For this chapter’s exercises, lets stick with the two simplifications - calculate sensitivity and average it across trials, and calculate sensitivity and look at the last trial (Exposure = 3).

Start by rearranging the data so the Sucrose and Drug consumption values are columns, rather than appearing on different rows

df <-  highfill %>%
  tidyr::pivot_wider(names_from = Solution, values_from = Consumption)

## Warning: Values from `Consumption` are not uniquely identified; output will contain
## list-cols.
## • Use `values_fn = list` to suppress this warning.
## • Use `values_fn = {summary_fun}` to summarise duplicates.
## • Use the following dplyr code to identify duplicates.
##   {data} %>%
##   dplyr::group_by(Sex, Line, vial, Exposure, Solution) %>%
##   dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
##   dplyr::filter(n > 1L)

df$Sucrose <-unlist(df$Sucrose)
df$Cocaine <-unlist(df$Cocaine)
df$Pref <-df$Cocaine - df$Sucrose
df$Line <-as.factor(df$Line)
df$Sex <- as.factor(df$Sex)
df$Exposure <- as.factor(df$Exposure)
head(df,10)

## # A tibble: 10 × 7
##    Sex   Line   vial Exposure Cocaine Sucrose    Pref
##    <fct> <fct> <dbl> <fct>      <dbl>   <dbl>   <dbl>
##  1 F     41        1 1           34.6    50   -15.4  
##  2 F     41        2 1           23.6    59   -35.4  
##  3 F     41        3 1           22.6    55   -32.4  
##  4 F     41        4 1           38.6    44    -5.4  
##  5 F     41        5 1           11.6    27   -15.4  
##  6 F     41        6 1           37.6    79.8 -42.2  
##  7 F     41        7 1           54.6    84.8 -30.2  
##  8 F     41        8 1           13.6    85.8 -72.2  
##  9 F     41        9 1           59.6    65.8  -6.20 
## 10 F     41       10 1           81.6    81.8  -0.200

Analyze just one survey

Analyze mean of 3 exposure times

What do you conclude?

Do you see variation among Drosophila genetic lines?

Does that variation depend on the sex of flies?

Does your conclusion match that of the original authors in their paper?

————————————————————————

Designs with nesting

D. Piscicide effects on trout

Birceanu and Wilkie (2018) examined potential concerns around the use of 3-trifluoromethyl-4-nitrophenol (TFM). This chemical is used to control an invasive species (sea lamprey) in the Great Lakes of North America. Control is directed at larval lampreys, and to be effective, it must be applied at relatively high concentrations. There is a question of risks this may pose to other fish species. Bircenau and Wilkie focused on physiological effects on rainbow trout (Oncorhynchus mykiss).

In the data we’ll look at, they focused on sub-lethal, environmentally relevant concentrations of TFM, and they considered effects on two pathways involved in detoxification of TFM. The hypothalamic-pituitary-interrenal (HPI) axis mediates the stress response of fishes, and on liver metabolic capacity. Many organisms detoxify pollutants via the liver.

Physiological measurements were done on individual trout. Fish were exposed to TFM (or control) for 9h, then stressed for 12h, after which physiological measurements were taken.

The animal husbandry involved fish being placed in large (180 L) tanks, with 24 fish per tank. Six tanks were used, and after fish had acclimatized for 24 h, TFM was added to three of the tanks. After 9h, all tanks were flushed and fish left overnight. After this time, 6 fish were removed from each tank and assayed, then fish were stressed (by chasing them vigorously for 3 min), then 6 more assayed after 1 and 4 h, then a further 6 sampled after 24h. Their Figure 1 has a diagram of the experimental setup.

df <- pone_0200782_s003 <- read_excel("../data/pone.0200782.s003.xlsx", 
     sheet = "Liver glycogen on SPSS", col_types = c("text", 
         "text", "text", "numeric"))
# Relabel glyggen
df <- df %>%
  dplyr::rename(Glycogen = "Glycogen (umol/mg protein)") %>%
  dplyr::rename(Tank = "Tank#")

head(df,10)

## # A tibble: 10 × 4
##    Treatment Tank  Time  Glycogen
##    <chr>     <chr> <chr>    <dbl>
##  1 TFM       1     0h        406.
##  2 TFM       1     0h       1161.
##  3 TFM       1     0h        250.
##  4 TFM       2     0h        450.
##  5 TFM       2     0h       2418.
##  6 TFM       2     0h        372.
##  7 TFM       2     0h       1081.
##  8 TFM       3     0h       3119.
##  9 TFM       3     0h       3652.
## 10 TFM       3     0h       3683.

Start by going through the steps at the start to be clear about the question, the kinds of predictors, etc. Pay particular attention to experimental units, and what the treatments are.

Let’s start with a simple question: what was the situation 4h after stress? Did exposure to TFM affect the stress response at 4h?

Describe two models that you could fit to address this question

Fit both of these models, using OLS (or you can go mixed model/REML if you feel like it)

What do you conclude about TFM?

What is the variation between tanks in Glycogen at 4h?

Do you have any suggestions for alternative ways to do this experiment?

What if we include all 4 stages?

How would your analysis change if we wanted to include very short and longer term responses to stress? Have a quick think about this and then take it up again when you get to Chapter 11.

E. Effects of altered climates on freshwater fish

Cramp et al. (2014) examined two aspects of changing climates, altered levels of UVB radiation and increased temperatures. Their interest was in the capacity for these changes to alter physiology (measured as oxygen consumption, VO2) and susceptibility to parasites (measured as prevalence, i.e. whether parasites were present, and intensity, the number of parasites present). They were especially interested in whether UV and temperature acted independently or synergistically, and they investigated this question with a small, widely-distributed freshwater fish, Gambusia holbrooki.

The experiment involved wild-caught fish (from the University of Queensland campus) that were brought into the laboratory and housed in 2L tanks. There were 40 tanks, each with 15 fish. After initial acclimitization, tanks were assigned randomly to one of four combinations of temperature (18 or 25 ℃) and UVB (high and low, corresponding to a more than 10-fold difference). After 10 days, fish were challenged with a common pathogen, whitespot, which is an ectoparasitic ciliate. Fish were left with the parasites for 8 d, after which the presence or absence of parasites was recorded (prevalence), the number of parasites counted (intensity), and a measure of each fishes metabolic rate (as oxygen consumption) taken.

Follow the steps at the top of these exercises to be sure that you are clear on the design and model

Let’s get the data and have a quick look. The data are available from Dryad, and there are two worksheets we’re interested in here, VO2 and Parasite intensity.

library(readxl)
cramp_vo2 <- read_excel("../data/Cramp et al raw data.xlsx", 
     sheet = "VO2")
cramp_intens <- read_excel("../data/Cramp et al raw data.xlsx", 
     sheet = "Infection Intensity")

cramp_vo2 <- cramp_vo2 %>%
  dplyr::rename(UV = "UV Level") %>%
  dplyr::rename(VO2 = "VO2 (ml h-1)") %>%
  dplyr::rename(Temperature = 'Temperature (oC)')
cramp_intens <- cramp_intens %>%
  dplyr::rename(UV = "UV Level") %>%
  dplyr::rename(Intensity = "Whitespots/fish") %>%
  dplyr::rename(Temperature = 'Temperature (oC)')

cramp_intens$Temperature<-as.factor(cramp_intens$Temperature)
cramp_intens$Tank <-as.factor(cramp_intens$Tank)
cramp_vo2$Temperature<-as.factor(cramp_vo2$Temperature)
cramp_vo2$Tank <-as.factor(cramp_vo2$Tank)

head(cramp_vo2, 10)

## # A tibble: 10 × 5
##    UV    Temperature Tank      VO2 `Body mass (g)`
##    <chr> <fct>       <fct>   <dbl>           <dbl>
##  1 Low   18          1     0.0405           0.124 
##  2 Low   18          1     0.0386           0.108 
##  3 Low   18          1     0.0264           0.0746
##  4 Low   18          2     0.0597           0.129 
##  5 Low   18          2     0.0438           0.099 
##  6 Low   18          2     0.00737          0.113 
##  7 Low   18          3     0.0199           0.0468
##  8 Low   18          3     0.0197           0.0505
##  9 Low   18          4     0.0132           0.0502
## 10 Low   18          4     0.0255           0.0682

head(cramp_intens, 10)

## # A tibble: 10 × 4
##    UV    Temperature Tank  Intensity
##    <chr> <fct>       <fct>     <dbl>
##  1 Low   18          1            15
##  2 Low   18          1             0
##  3 Low   18          1             0
##  4 Low   18          1            12
##  5 Low   18          1            11
##  6 Low   18          1            15
##  7 Low   18          1            15
##  8 Low   18          1            14
##  9 Low   18          1            11
## 10 Low   18          2            11

Start by looking at metabolic performance of fish

What are the two main models you could fit to assess the effects of UV and Temperature (Hint: the difference is how tanks are incorporated into the model)?

What assumptions are made in fitting each model?

Use the two approaches to analyse VO2 and outline your conclusions

What are your thoughts about the effects of UV and Temperature on VO2?

And for a bit more, use the approach you’ve developed to look at parasite intensity

If you’re still feeling enthusiastic, keep these data in mind when you get to Chapter 13, and take a look at prevalence (it’s a separate sheet in the Excel file from dryad)

References

Birceanu, Oana, and Michael Patrick Wilkie. 2018. “Post-Exposure Effects of the Piscicide 3-Trifluoromethyl-4-Nitrophenol (TFM) on the Stress Response and Liver Metabolic Capacity in Rainbow Trout (Oncorhynchus Mykiss).” Plos One 13 (7). https://doi.org/gdw2tk.

Cramp, Rebecca L., Stefanie Reid, Frank Seebacher, and Craig E. Franklin. 2014. “Synergistic Interaction Between UVB Radiation and Temperature Increases Susceptibility to Parasitic Infection in a Fish.” Biology Letters 10 (9): 20140449. https://doi.org/gr9ckp.

Didiano, Teresa J., Marc T. J. Johnson, and Tim P. Duval. 2016. “Disentangling the Effects of Precipitation Amount and Frequency on the Performance of 14 Grassland Species.” Plos One 11 (9). https://doi.org/f9rkgk.

Highfill, Chad A., Brandon M. Baker, Stephenie D. Stevens, Robert R. H. Anholt, and Trudy F. C. Mackay. 2019. “Genetics of Cocaine and Methamphetamine Consumption and Preference in Drosophila Melanogaster.” PLOS Genetics 15 (5). https://doi.org/gr2pgk.

Walter, D. E., and D. J. O’Dowd. 1992. “Leaves with Domatia Have More Mites.” Ecology 73 (4): 1514–18. https://doi.org/10.2307/1940694.