Chapter 5

The aim of these exercises is to help you get comfortable with running exploratory data analysis - taking a data set and a potential model, and evaluating whether the two are compatible. This activity will be part of most of the exercises in the later chapters. Particular things to look for are:

Does the data set have extreme values (outliers)?
How do you identify those values?
- Are the observations legitimate or mistakes?
- Are extreme values influential?
What statistical model is proposed for this data set?
- What are the important assumptions?
- How can you evaluate whether those assumptions are satisfied?
- If they aren’t satisfied, what are your options?

Exploratory data analysis can be counter-intuitive. We emphasise its value as something to do before going ahead with fitting the model, assessing parameters, and testing hypotheses, but some assumptions can only be assessed by fitting the model. At this stage, we’re aiming to fit the model without bothering to look at parameters of interest, confidence intervals, or P-values, but software doesn’t always make this easy. We’re trying to avoid the temptation of P-hacking or its equivalent - fitting a model that will give us desired results, rather than one that is appropriate for the particular data set. If you wanted to be super-rigorous about this step, you could write some Rmarkdown code where the results of model-fitting are suppressed (using RESULTS= ‘Hide’, etc.) and you just generate, e.g. residual plots!

A. Elephant seal foraging

Continue with the Le Boeuf et al. (2000) elephant seal example from Chapter 4’s exercises.

For the linear model you’ve specified, what are the assumptions?

Open the data file and check those assumptions.

df <- read.csv("../data/leboeuf.csv")
head(df,10)

##    male departwt distance FFAduration durationto durationfrom
## 1   Pop       NA      534          31         18           11
## 2   Alt      973      755          89          9            8
## 3   Pro      977     1210          77         12           18
## 4   Hal     1121       NA          NA         NA           NA
## 5   Blu       NA     1297          76         19           25
## 6   Dua      996     1487          68         18           23
## 7   Rov     1100     2073          69         29           25
## 8   Ric     1068     2181          46         21           42
## 9   Ori     1097       NA          NA         NA           NA
## 10  Jer     1199       NA          NA         NA           NA

The data also include information on departure weight. Have a look to see if that variable might also be linked to foraging duration, and whether it might also be linked to distance travelled.

Hint: here’s your chance at a scatterplot matrix

B. Invertebrates in mussel clumps

Continue with the Peake and Quinn (1993) example from Chapter 4’s exercises, the relationships between the number of individuals and number of species (response variables) against mussel clump area (predictor variable).

For the linear model you’ve specified, what are the assumptions?

Open the data file and check those assumptions.

df <- read.csv("../data/peakquinn.csv")
head(df,10)

##       area indiv species
## 1   516.00    18       3
## 2   469.06    60       7
## 3   462.25    57       6
## 4   938.60   100       8
## 5  1357.15    48      10
## 6  1773.66   118       9
## 7  1686.01   148      10
## 8  1786.29   214      11
## 9  3090.07   225      16
## 10 3980.12   283       9

C. Honeydew production in aphids

Continue with the Vosteen, Gershenzon, and Kunert (2016) study examining patterns of production of honeydew by different races of pea aphids (Acyrthosiphon pisum) from Chapter 4’s exercises. You should have described a model to fit to these data.

For the linear model you’ve specified, what are the assumptions?

Read in the data and check the assumptions

df <- read.csv("../data/vosteen.csv")
head(df,10)

##    clone_plant_combination honeydew clone plant hostplant
## 1                  T_Vicia     1.08     T Vicia Universal
## 2                  T_Vicia     2.21     T Vicia Universal
## 3                  T_Vicia     2.63     T Vicia Universal
## 4                  T_Vicia     1.63     T Vicia Universal
## 5                  T_Vicia     3.51     T Vicia Universal
## 6                  T_Vicia     2.53     T Vicia Universal
## 7                  T_Vicia     2.92     T Vicia Universal
## 8                  T_Vicia     0.98     T Vicia Universal
## 9                  T_Vicia     2.39     T Vicia Universal
## 10                 T_Vicia     2.05     T Vicia Universal

D. Parasites and fish swimming

Again, you examined in Chapter 4’s exercises the work of Binning, Roche, and Layton (2013) who studied the effect of ectoparasitic isopods on the swimming ability of a tropical species of bream. They created four treatment groups in the laboratory: eight unparasitized fish, 10 parasitized fish, 10 parasitized fish that had the parasites removed, and ten unparasitized fish that had model parasites made of plastic added. They recorded the swimming speed (body lengths per second) and oxygen consumption (mgO₂ per kg per hour) of each fish. The data are available from datadryad. Within dryad, the dataset you want is “binning etal 2012 one way anova.txt”; in this dataset, SMR is Standard Metabolic Rate (O₂ consumption), AS = factorial aerobic scope, and Ucrit is swimming speed.

In Chapter 4’s exercises, you should have described a model to fit to these data.

What are the assumptions associated with that model?

Read in the data and check the assumptions

df <- read.csv("../data/binning.csv")
head(df,10)

##    Fish Treatment    SMR    MMR   AS Ucrit
## 1   P10         P 110.20 535.80 4.86  3.79
## 2   P12         P 140.11 471.45 3.36  3.66
## 3   P27         P 135.84 573.08 4.22  2.47
## 4   P42         P 140.44 379.69 2.70  3.65
## 5   P15         P 120.54 561.93 4.66  3.52
## 6   P23         P 108.02 375.57 3.48  3.39
## 7   P26         P 119.23 534.75 4.49  3.74
## 8   P37         P 152.73 494.95 3.24  3.37
## 9   P72         P 100.86 429.73 4.26  3.36
## 10  P75         P 134.09 434.60 3.24  3.68

E. Neuroanatomy of insectivours mammals

Kaufman et al. (2013) examined the neuroanatomy of a recently described species of sengi, which are small insectivorous mammals also known as elephant or jumping shrews. These animals are interesting, having been originally placed with the mammalian order Insectivora, along with shrews, hedgehogs, moles, etc., but this group is now known to be polyphyletic, and sengis are more appropriately grouped with elephants, dugongs, and hyraxes. They are in the order Macroscelidea, within the Afrotheria. The Afrotheria includes another order of small insectivores, the Tenrecoidea (tenrecs and golden moles). The Laurasiatheria also includes several families of small insectivores

Small insectivores are generally thought to have small brain mass (when adjusted for overall body mass), but there has been some question of whether sengis fit this pattern, and Kaufman and colleagues were curious whether the new species, Rhynchocyon udzungwensis, fitted with other sengi. They assembled data from 56 small insectivores, 5 sengi, 14 afrotherian species, and 37 laurasiatherians. For each species, they calculated brain mass (in mg) and total body mass (g).

Data are all presented in Table 1 of the paper. We’ve extracted it from the paper and it’s here kaufman.csv

In the exercises for this chapter, we’ll just think about brain size relative to body size, and we’ll pick this example up again in Chapter 8.

Load the data file, and look at the relationship between brain mass and body mass.

df <- read.csv("../data/kaufman.csv")
head(df,10)

##            family        genus   species bodymass brainmass        relation
## 1  Solenodontidae    Solenodon paradoxus    672.0      4723 laurasiatherian
## 2      Tenrecidae       Tenrec ecaudatus    852.0      2588     afrotherian
## 3      Tenrecidae      Setifer   setosus    237.0      1516     afrotherian
## 4      Tenrecidae Hemicentetes  semispin    116.0       839     afrotherian
## 5      Tenrecidae     Echinops  telfairi     87.5       623     afrotherian
## 6      Tenrecidae  Oryzorictes talpoides     44.2       580     afrotherian
## 7      Tenrecidae    Microgale    cowani     15.2       420     afrotherian
## 8      Tenrecidae    Limnogale  mergulus     92.0      1150     afrotherian
## 9      Tenrecidae    Microgale   dobsoni     31.9       557     afrotherian
## 10     Tenrecidae    Microgale  talazaci     48.2       766     afrotherian
##            relation2
## 1  other insectivore
## 2  other insectivore
## 3  other insectivore
## 4  other insectivore
## 5  other insectivore
## 6  other insectivore
## 7  other insectivore
## 8  other insectivore
## 9  other insectivore
## 10 other insectivore

What kind of model are we intending to fit to these data?

Look at the relationship between the two variables? Are there any steps you’d recommend we take?

Note that the original researchers used a reduced major axis regression, as they considered both variables measured with error. Note the discussion in Chapter 6 about whether to consider X random or fixed. For our purposes here, we’ll treat it as fixed

References

Binning, Sandra A., Dominique G. Roche, and Cayne Layton. 2013. “Ectoparasites Increase Swimming Costs in a Coral Reef Fish.” Biology Letters 9 (1): 20120927. https://doi.org/gsgbsx.

Kaufman, Jason A., Gregory H. Turner, Patricia A. Holroyd, Francesco Rovero, and Ari Grossman. 2013. “Brain Volume of the Newly-Discovered Species Rhynchocyon Udzungwensis (Mammalia: Afrotheria: Macroscelidea): Implications for Encephalization in Sengis.” Edited by Andrew Iwaniuk. PLoS ONE 8 (3): e58667. https://doi.org/f4qwz3.

Le Boeuf, B. J., D. E. Crocker, D. P. Costa, S. B. Blackwell, P. M. Webb, and D. S. Houser. 2000. “Foraging Ecology of Northern Elephant Seals.” Ecological Monographs 70 (3): 353–82. https://doi.org/fj9rqc.

Peake, A. J., and G. P. Quinn. 1993. “Temporal Variation in Species-Area Curves for Invertebrates in Clumps of an Intertidal Mussel.” Ecography 16: 269–77. https://doi.org/cwzvgc.

Vosteen, Ilka, Jonathan Gershenzon, and Grit Kunert. 2016. “Hoverfly Preference for High Honeydew Amounts Creates Enemy-Free Space for Aphids Colonizing Novel Host Plants.” Edited by Kim Cuddington. Journal of Animal Ecology 85 (5): 1286–97. https://doi.org/f9csdq.