The aim of these exercises is to start getting familiar with mixed models, and in these exercises we’ll be focused on models with multiple categorical predictors, with at least one predictor a random effect. The random effects can be crossed or be nested within other predictors, creating a nested or multilevel structure.
For each of the examples below, you should follow the sequence we’ve used previously:
What is the biological question and what are the response and predictor variables?
What distribution do you expect the response variable to follow?
Is each predictor
Write out the linear model corresponding to the biological question.
What are the assumptions behind the statistical model you’ll fit?
Fit the model
How will you assess whether the model fits well?
Can you detect an effect of the fixed predictors?
How much of the variance is attributable to each random predictor?
What do you conclude (including any cautions)
In fitting these models, we’ll need to make the initial checklist more complex:
Think about the relationship between predictors - are they crossed (factorial) or nested (hierarchical)? Make sure that your linear model reflects these relationships!
When there are random effects, assumptions can be more extensive. We’re usually interested in the fixed effects, and different fixed effects can have different assumptions. This is particularly the case with nested designs.
Remember from Chapter 10 that these designs are typically used for two purposes.
Random effects are often used to reduce background noise in the data (e.g. randomized blocks designs), where the random effect is of little intrinsic interest. In a sense, we sacrifice degrees of freedom from the residual variance, anticipating that the variance attributed to the random effects will be large enough to make this sacrifice worthwhile. When you see these designs, it’s a good idea to look and see if the use of a random effect seems a good decision, e.g. by looking at blocking efficiency.
In other cases, the random effects are used to estimate how consistently fixed effects act, by estimating variance in their effects across, for example, an environmental spectrum (fixed predictor).
Walter and O’Dowd (1992) were interested in testing the hypothesis that leaves of the shrub Viburnum tinus with domatia (small shelters at the juncture of veins on leaves) have more mites than leaves without domatia. Fourteen paired leaves on a shrub of V. tinus were randomly chosen and one leaf in each pair had its domatia shaved while the other remained as a control; the number of mites was recorded on each leaf (experimental unit) after two weeks.
The data file is here: walter.csv.
df <- read.csv("../data/walter.csv")
df
## leaf pair domatia mites
## 1 a1 a intact 9
## 2 a2 a shaved 1
## 3 b1 b intact 2
## 4 b2 b shaved 1
## 5 c1 c intact 0
## 6 c2 c shaved 2
## 7 d1 d intact 12
## 8 d2 d shaved 4
## 9 e1 e intact 15
## 10 e2 e shaved 2
## 11 f1 f intact 3
## 12 f2 f shaved 1
## 13 g1 g intact 11
## 14 g2 g shaved 0
## 15 h1 h intact 6
## 16 h2 h shaved 0
## 17 i1 i intact 7
## 18 i2 i shaved 1
## 19 j1 j intact 6
## 20 j2 j shaved 0
## 21 k1 k intact 5
## 22 k2 k shaved 1
## 23 l1 l intact 8
## 24 l2 l shaved 1
## 25 m1 m intact 3
## 26 m2 m shaved 1
## 27 n1 n intact 6
## 28 n2 n shaved 0
This file has four variables:
leaf is just a code to identify individual leaves
pair identifies the pair to which that leaf belongs
domatia describes the experimental treatments, whether domatia were shaved off or left intact
mites is the number of mites recorded from that leaf
Focus for now on using an OLS approach, so think about the response variable and how you might treat it.
Are there alternatives to how you treated the response variable in your analysis.
Hint: there are at least a couple
Would your conclusions have changed?
Didiano, Johnson, and Duval (2016) assessed the effects of rainfall intensity and frequency on performance of grassland plants in Ontario, attempting to understand potential effects of future climates. The area is projected to receive much more summer rainfall than at present. They established a field experiment in which plants were grown from seeds in pots subjected to one of two rainfall levels (70 and 90 mm per month) and three frequencies (3, 5, and 15 days per month). They were interested in whether responses of plants varied according to whether the plants were monocots (5 species, though only 4 grew successfully) or eudicots (10, species).
The experiment was done using shelters (see their figure S2) that prevented natural rainfall, allowing the researchers to control water received by the plants. They used 10 shelters, each of which had 6 groups of 18 pots (see their figure S1). Each group received one of the 6 frequency/amount combinations. The 18 pots in each group had one of the 15 target species, with three pots used to measure soil moisture levels during the experiment.
For each seedling, they recorded leaf number (which didn’t change much), plant height, and above- and below-ground biomass at the end of the experiment. They also calculated a ratio of above:below biomass.
The data are available from dryad; the data you’ll use are the first sheet in the Excel file, which you can import directly, or use Excel to save that first sheet as a csv file. To make things a little complicated, the data file uses only common names, while the paper uses formal scientific names. We’ve made our own version of the file with species names and monocot/eudicot added - it’s didiano.csv
Let’s keep things simple by focusing on one response variable (above.ground.biomass) and one species (Big bluestem)
didiano <- read.csv("../data/didiano.csv")
#Now create subset using just one species
df<-subset(didiano, Species=="Big bluestem")
head(df,10)
## Plant.ID Shelter Plot Pot Species Amount Frequency
## 1 43 1 1 10 Big bluestem 70 3
## 2 56 1 2 26 Big bluestem 70 15
## 3 42 1 3 43 Big bluestem 70 30
## 4 18 1 4 61 Big bluestem 90 3
## 5 81 1 5 87 Big bluestem 90 15
## 6 80 1 6 102 Big bluestem 90 30
## 7 167 2 7 117 Big bluestem 70 3
## 8 155 2 8 133 Big bluestem 70 15
## 9 181 2 9 157 Big bluestem 70 30
## 10 106 2 10 160 Big bluestem 90 3
## X1st.leaf.measurement X1st.plant.height.measurement X2nd.leaf.measurement
## 1 21 18.5 15
## 2 14 14.0 18
## 3 28 21.0 40
## 4 9 18.0 23
## 5 28 18.5 32
## 6 17 18.0 24
## 7 22 20.0 21
## 8 43 20.0 33
## 9 20 19.0 34
## 10 22 17.0 49
## X2nd.plant.height.measurement Above.ground.biomass Below.ground.biomass
## 1 51.0 3.2582 5.11740
## 2 44.0 3.5384 5.13360
## 3 122.5 9.9042 10.92450
## 4 59.0 4.5037 7.07450
## 5 103.0 7.5734 12.62010
## 6 101.5 6.6888 18.50870
## 7 74.0 5.4688 13.18560
## 8 61.0 9.5101 23.75450
## 9 58.0 4.9502 1.07738
## 10 81.0 10.0241 26.37970
## Above..to.below.ground.ratio Sp_name Type
## 1 15.65281444 Andropogon monocot
## 2 12.43499887 Andropogon monocot
## 3 12.36849014 Andropogon monocot
## 4 13.10033972 Andropogon monocot
## 5 13.60023239 Andropogon monocot
## 6 15.17462026 Andropogon monocot
## 7 13.53130486 Andropogon monocot
## 8 6.414233289 Andropogon monocot
## 9 11.71669832 Andropogon monocot
## 10 8.080525933 Andropogon monocot
Focus for now on using an OLS approach, so think about the response variable how you might treat it.
Now use REML instead
Highfill et al. (2019) examined effects of two psychostimulants of public health concern (cocaine and methamphetamine) using Drosophila melanogaster as a “translational” model species. Translational reflected similarities in dopamine transport and the overall effects of exposure in fruitflies and humans. Their interest was in variation in susceptibility to these compounds, so they examined responses of lines of flies chosen from a large set of inbred lines (the Genetic Reference Panel). Groups of 5 flies from a line were placed into test vials and their consumption of sucrose media and sucrose + drug media recorded after 18 h. There’s a nice diagram of the test vial here. Please look at it before proceeding.
It’s not uncommon to find sex-based differences in consumption, so they used groups of males and groups of female flies for each line. Combinations were replicated 10 times.
The final complexity was that they were interested in whether drug sensitivity changed with time, so they allowed 6 hours for flies to feed, then ran another trial, followed by more food, and a third.
As you work your way through the analysis sequence described at the start of these exercises, there are two important questions for you to answer. This is a complex example, so start with a factor diagram (see, e.g. Figure 10.2). It will help to clarify the relationships between the different factors. Look at the data file to see how they’re named. You can download it here
library(readxl)
highfill <- read_excel("../data/pgen.1007834.s001.xlsx")
highfill <- highfill %>%
dplyr::rename(Line = "DGRP Line") %>%
dplyr::rename(vial = Replicate)
head(highfill,10)
## # A tibble: 10 × 6
## Solution Sex Line vial Consumption Exposure
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Cocaine F 41 1 34.6 1
## 2 Cocaine F 41 2 23.6 1
## 3 Cocaine F 41 3 22.6 1
## 4 Cocaine F 41 4 38.6 1
## 5 Cocaine F 41 5 11.6 1
## 6 Cocaine F 41 6 37.6 1
## 7 Cocaine F 41 7 54.6 1
## 8 Cocaine F 41 8 13.6 1
## 9 Cocaine F 41 9 59.6 1
## 10 Cocaine F 41 10 81.6 1
Hint: it’s more than 3!
Factorial mixed model, with 3 fixed effects (Solution, Sex, Exposure) and one random effect (Line)
Partly-nested mixed model, with vials as plots or subjects. Sex, Exposure and Line as between-plot effects and Drug Effect (the pairs of capillary tubes) as the within-plots effect
Partly-nested mixed model, with vials as plots or subjects. Sex and Line as between-plot effects and Drug Effect and Exposure as within-plots effect.
We’ll approach the analysis of this data set in a couple of ways - simple and complex. The complex way is an extension exercise for Chapter 12. The keenest of you might like to take on the challenge of specifying and running this model. Do it after you’ve finished the Chapter 12 exercises, and with reference to Box 12.5.
We can also make the main question (variation in sensitivity to a psychostimulant) more tractable. The stimulant effect is the difference between sucrose and the drug solution, so we could calculate that for each vial.
We still have the question of what to do with the three trials (Exposure).
- You could jump ahead to Chapter 12 and see whether sensitivity varies with time, and whether that variation varies with line
- Use the three trials as just a way of estimating sensitivity, i.e. average them
- Choose one of the trials as "the" endpoint, e.g. Trial 3
For this chapter’s exercises, lets stick with the two simplifications - calculate sensitivity and average it across trials, and calculate sensitivity and look at the last trial (Exposure = 3).
Start by rearranging the data so the Sucrose and Drug consumption values are columns, rather than appearing on different rows
df <- highfill %>%
tidyr::pivot_wider(names_from = Solution, values_from = Consumption)
## Warning: Values from `Consumption` are not uniquely identified; output will contain
## list-cols.
## • Use `values_fn = list` to suppress this warning.
## • Use `values_fn = {summary_fun}` to summarise duplicates.
## • Use the following dplyr code to identify duplicates.
## {data} %>%
## dplyr::group_by(Sex, Line, vial, Exposure, Solution) %>%
## dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
## dplyr::filter(n > 1L)
df$Sucrose <-unlist(df$Sucrose)
df$Cocaine <-unlist(df$Cocaine)
df$Pref <-df$Cocaine - df$Sucrose
df$Line <-as.factor(df$Line)
df$Sex <- as.factor(df$Sex)
df$Exposure <- as.factor(df$Exposure)
head(df,10)
## # A tibble: 10 × 7
## Sex Line vial Exposure Cocaine Sucrose Pref
## <fct> <fct> <dbl> <fct> <dbl> <dbl> <dbl>
## 1 F 41 1 1 34.6 50 -15.4
## 2 F 41 2 1 23.6 59 -35.4
## 3 F 41 3 1 22.6 55 -32.4
## 4 F 41 4 1 38.6 44 -5.4
## 5 F 41 5 1 11.6 27 -15.4
## 6 F 41 6 1 37.6 79.8 -42.2
## 7 F 41 7 1 54.6 84.8 -30.2
## 8 F 41 8 1 13.6 85.8 -72.2
## 9 F 41 9 1 59.6 65.8 -6.20
## 10 F 41 10 1 81.6 81.8 -0.200
Do you see variation among Drosophila genetic lines?
Does that variation depend on the sex of flies?
Does your conclusion match that of the original authors in their paper?
Birceanu and Wilkie (2018) examined potential concerns around the use of 3-trifluoromethyl-4-nitrophenol (TFM). This chemical is used to control an invasive species (sea lamprey) in the Great Lakes of North America. Control is directed at larval lampreys, and to be effective, it must be applied at relatively high concentrations. There is a question of risks this may pose to other fish species. Bircenau and Wilkie focused on physiological effects on rainbow trout (Oncorhynchus mykiss).
In the data we’ll look at, they focused on sub-lethal, environmentally relevant concentrations of TFM, and they considered effects on two pathways involved in detoxification of TFM. The hypothalamic-pituitary-interrenal (HPI) axis mediates the stress response of fishes, and on liver metabolic capacity. Many organisms detoxify pollutants via the liver.
Physiological measurements were done on individual trout. Fish were exposed to TFM (or control) for 9h, then stressed for 12h, after which physiological measurements were taken.
The animal husbandry involved fish being placed in large (180 L) tanks, with 24 fish per tank. Six tanks were used, and after fish had acclimatized for 24 h, TFM was added to three of the tanks. After 9h, all tanks were flushed and fish left overnight. After this time, 6 fish were removed from each tank and assayed, then fish were stressed (by chasing them vigorously for 3 min), then 6 more assayed after 1 and 4 h, then a further 6 sampled after 24h. Their Figure 1 has a diagram of the experimental setup.
df <- pone_0200782_s003 <- read_excel("../data/pone.0200782.s003.xlsx",
sheet = "Liver glycogen on SPSS", col_types = c("text",
"text", "text", "numeric"))
# Relabel glyggen
df <- df %>%
dplyr::rename(Glycogen = "Glycogen (umol/mg protein)") %>%
dplyr::rename(Tank = "Tank#")
head(df,10)
## # A tibble: 10 × 4
## Treatment Tank Time Glycogen
## <chr> <chr> <chr> <dbl>
## 1 TFM 1 0h 406.
## 2 TFM 1 0h 1161.
## 3 TFM 1 0h 250.
## 4 TFM 2 0h 450.
## 5 TFM 2 0h 2418.
## 6 TFM 2 0h 372.
## 7 TFM 2 0h 1081.
## 8 TFM 3 0h 3119.
## 9 TFM 3 0h 3652.
## 10 TFM 3 0h 3683.
Describe two models that you could fit to address this question
Fit both of these models, using OLS (or you can go mixed model/REML if you feel like it)
How would your analysis change if we wanted to include very short and longer term responses to stress? Have a quick think about this and then take it up again when you get to Chapter 11.
Cramp et al. (2014) examined two aspects of changing climates, altered levels of UVB radiation and increased temperatures. Their interest was in the capacity for these changes to alter physiology (measured as oxygen consumption, VO2) and susceptibility to parasites (measured as prevalence, i.e. whether parasites were present, and intensity, the number of parasites present). They were especially interested in whether UV and temperature acted independently or synergistically, and they investigated this question with a small, widely-distributed freshwater fish, Gambusia holbrooki.
The experiment involved wild-caught fish (from the University of Queensland campus) that were brought into the laboratory and housed in 2L tanks. There were 40 tanks, each with 15 fish. After initial acclimitization, tanks were assigned randomly to one of four combinations of temperature (18 or 25 ℃) and UVB (high and low, corresponding to a more than 10-fold difference). After 10 days, fish were challenged with a common pathogen, whitespot, which is an ectoparasitic ciliate. Fish were left with the parasites for 8 d, after which the presence or absence of parasites was recorded (prevalence), the number of parasites counted (intensity), and a measure of each fishes metabolic rate (as oxygen consumption) taken.
Let’s get the data and have a quick look. The data are available from Dryad, and there are two worksheets we’re interested in here, VO2 and Parasite intensity.
library(readxl)
cramp_vo2 <- read_excel("../data/Cramp et al raw data.xlsx",
sheet = "VO2")
cramp_intens <- read_excel("../data/Cramp et al raw data.xlsx",
sheet = "Infection Intensity")
cramp_vo2 <- cramp_vo2 %>%
dplyr::rename(UV = "UV Level") %>%
dplyr::rename(VO2 = "VO2 (ml h-1)") %>%
dplyr::rename(Temperature = 'Temperature (oC)')
cramp_intens <- cramp_intens %>%
dplyr::rename(UV = "UV Level") %>%
dplyr::rename(Intensity = "Whitespots/fish") %>%
dplyr::rename(Temperature = 'Temperature (oC)')
cramp_intens$Temperature<-as.factor(cramp_intens$Temperature)
cramp_intens$Tank <-as.factor(cramp_intens$Tank)
cramp_vo2$Temperature<-as.factor(cramp_vo2$Temperature)
cramp_vo2$Tank <-as.factor(cramp_vo2$Tank)
head(cramp_vo2, 10)
## # A tibble: 10 × 5
## UV Temperature Tank VO2 `Body mass (g)`
## <chr> <fct> <fct> <dbl> <dbl>
## 1 Low 18 1 0.0405 0.124
## 2 Low 18 1 0.0386 0.108
## 3 Low 18 1 0.0264 0.0746
## 4 Low 18 2 0.0597 0.129
## 5 Low 18 2 0.0438 0.099
## 6 Low 18 2 0.00737 0.113
## 7 Low 18 3 0.0199 0.0468
## 8 Low 18 3 0.0197 0.0505
## 9 Low 18 4 0.0132 0.0502
## 10 Low 18 4 0.0255 0.0682
head(cramp_intens, 10)
## # A tibble: 10 × 4
## UV Temperature Tank Intensity
## <chr> <fct> <fct> <dbl>
## 1 Low 18 1 15
## 2 Low 18 1 0
## 3 Low 18 1 0
## 4 Low 18 1 12
## 5 Low 18 1 11
## 6 Low 18 1 15
## 7 Low 18 1 15
## 8 Low 18 1 14
## 9 Low 18 1 11
## 10 Low 18 2 11
What assumptions are made in fitting each model?