The aim of these exercises is to make sure you’re comfortable with fitting simple (one predictor) linear models to data and to emphasise the similarity between regression and “ANOVA”. We will complete some of the analyses we introduced in Chapter 4’s exercises.
For each of the 5 examples below, you should follow the sequence we’ve used previously:
What is the biological question and what are the response and predictor variables?
What distribution do you expect the response variable to follow?
Are the predictors continuous or categorical?
Write out the linear model corresponding to the biological question.
What are the assumptions behind the statistical model you’ll fit?
Fit the model
How will you assess whether the model fits well?
Can you detect an effect of the predictor?
How do you measure the effect?
What, if any, next steps would you suggest?
What do you conclude (including any cautions)?
Le Boeuf et al. (2000) examined the foraging behaviour of northern elephant seals (Mirounga angustirostris) that breed along the west coast of Mexico and the USA. They attached platform satellite transmitter terminals (PTTs) to 22 male seals and recorded, for each seal, the distance (km) to its main feeding area offshore and the amount of time (duration in days) it spent at the feeding area.
You can get this data file here.
Peake and Quinn (1993) investigated the relationship between the number of individuals and number of species (response variables) of invertebrates living in amongst clumps of mussels on a rocky intertidal shore and the area of those mussel clumps (predictor).
You can get this data file here.
Dai et al. (2020) examined the relationships between aboveground (AGB) and belowground (BGB) plant biomass from 80 sites across four types of grassland (temperate grassland, desert grassland, alpine meadow, meadow steppe) in a region of China. Their research questions did not distinguish response and predictor variables so they used reduced major axis (RMA) regression to determine the slopes of the relationships between log(ABG) and log(BGB) for each grassland type and compare these to a slope of 1, indicating an (isometric) allometric relationship.
Vosteen, Gershenzon, and Kunert (2016) examined patterns of production of honeydew by different races of pea aphids (Acyrthosiphon pisum) and how that attracts ovipositioning hoverflies (Episyrphus balteatus) to create enemy-free space for the aphids. They measured honeydew production (mg) over 24 hours by three races of aphids (representing the native hosts they were collected from: Triflolium, Pisum, Medicago races) on plants of their native hosts and also the universal host plant Vicia faba in a climate chamber. There were six combinations of aphid race and host (native vs universal) plant that Vosteen et al (2016) treated as a single factor.
The data from this paper are available through datadryad . For this question, the data set we’ll use is for Figure 2e. The authors’ analysis is provided and uses the data set …2e…collection.txt, but we want to do a different analysis in Chapter 7, so the file with additional factors is here.
Binning, Roche, and Layton (2013) studied the effect of ectoparasitic isopods (Anilocra nemipteri) on the swimming ability of a tropical species of bream (Scolopsis bilineatus). They collected 18 unparasitized and 20 parasitized fish from Lizard Island on the Great Barrier Reef and created four treatment groups in the laboratory: eight unparasitized fish, 10 parasitized fish, 10 parasitized fish that had the parasites removed, and ten unparasitized fish that had model parasites made of plastic added. They recorded the swimming speed (body lengths per second) and oxygen consumption (mgO2 per kg per hour) of each fish, in a respirometer.
The data are available from datadryad. Within dryad, the dataset you want is “binning etal 2012 one way anova.txt”; in this dataset, SMR is Standard Metabolic Rate (O2 consumption), AS = factorial aerobic scope, and Ucrit is swimming speed.
You should be able to use the answer from the previous section to generate the code you’ll need
Here is an example from Kiss et al. (2017). They were interested in the effects of intake of an invasive weed (ragweed) on health of humans as this species is now being used as a herbal supplement. They did an experiment using Wistar laboratory rats. Twenty-four rats were randomly allocated to one of three treatment groups, with eight rats in each group. Group 1 was a control group which was just fed cookie dough, group 2 rats were fed cookie dough with a low dose of ragweed and group 3 rats were fed cookie dough with a high dose of ragweed. The total amount of feed was the same in each group and the experiment ran for 28 days, with rats fed daily. At the end of the experiment, a range of blood parameters were measured and we will focus on aspartate aminotransferase (ast).
You can access the data file here. However, it’s an Excel file not really formatted for R. You’ll need to delete some rows and redo the treatment labels to be able to use it as a data frame. Our reformatting of the file is here. Check your file against it if you have problems.
Again, you should use the same approach as for the two previous questions (although the answers will be different!)