Chapter 16

The aim of these exercises is to continue making sure you’re comfortable with handling multivariate data. In this chapter’s exercises, you’ll focus on analyses based on dissimilarities/distances, including fitting linear models to these kinds of response variables.

You’ll need the packages vegan and mvabund.

For each of the examples below, you should follow the sequence we’ve used previously, as far as it’s sensible:

  1. What is the biological question and what are the response and predictor variables?

  2. What distribution do you expect the response variable to follow?

  3. Is each predictor

    1. Continuous or categorical?
    2. Fixed or random?
  4. Write out the linear model corresponding to the biological question.

  5. What are the assumptions behind the statistical model you’ll fit?

    1. Are those assumptions satisfied?
  6. Fit the model

    1. How will you assess whether the model fits well?

    2. Can you detect an effect of the fixed predictors?

    3. How much of the variance is attributable to each random predictor?

  7. What do you conclude (including any cautions)


A. Does fire history affect reptile assemblages?

Dixon et al. (2018) focused on assemblages of reptiles in forests and woodlands of southeastern Australia, with a particular interest in relationships between these assemblages and the fire history of the landscape. They identified 81 sites that varied in time since fire, from 6 months to >96 y. Rather than a continuum, three categories of time since fire were used, 0.5-2y, 6-12y, and >96y. Sites were also classified according to habitat.

Reptile assemblages were sampled with a range of methods, including visual surveys and camera traps, to give counts of 20 reptiles, whose abundance ranged from 0-1 to 0-126, depending on species.

Data are available from dryad, as Rep_abund.csv. You’ll want to focus on the columns with reptile numbers, and the one with the fire history (tsf). For convenience, we’ve extracted those data as dixonbiota.csv

Start by having a quick look at the data.

df <- read_csv("../data/dixonbiota.csv")
## Rows: 79 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): site, tsf
## dbl (20): adup, aplat, amur, amac, aram, dcor, ecun, esax, eul, htal, ldel, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df, 10)
## # A tibble: 10 × 22
##    site  tsf    adup aplat  amur  amac  aram  dcor  ecun  esax   eul  htal  ldel
##    <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 A01   long      1     0     4     2     0     1     0     0     0     1     1
##  2 A02   long      0     0     4     2     0     0     0     0     1     1     0
##  3 A04   long      0     3     0     1     0     1     0     0     0     0     0
##  4 A05   long      2     0     0     1     0     0     2     0     1     0     2
##  5 A06   long      0     1     1     4     1     0     0     1     0     0     5
##  6 A07   long      0     0     0     1     0     1     0     0     0     1     2
##  7 A08   long      3     1     0     1     0     0     2     0     1     0     2
##  8 A09   long      1     1     1     2     0     0     0     0     1     0     3
##  9 A10   long      0     2     0     0     0     0     1     2     2     0     0
## 10 B01   short     0     2     0     0     0     0     0     0     0     0     0
## # ℹ 9 more variables: lgui <dbl>, lwhi <dbl>, ppor <dbl>, pent <dbl>,
## #   pspe <dbl>, ptex <dbl>, rdie <dbl>, tnig <dbl>, vros <dbl>

The file is pretty straightforward, with tsf being the fire category, and columns 3-22 each representing one reptile taxon.

Run the analysis and provide your interpretation

How would you fit a linear model to assess fire effects?

How would you decide which groups (fire histories) are different from each other? How would the differences in dispersion affect your interpretation of permanova results?

Suppose you think of a reptile assemblage as simply the species that are present, regardless of how common they are.

What would you conclude about the infuence of time since fire now?

Extension activity

Dixon and their colleagues also recorded a range of habitat variables, which we’ve collected for you in dixonenv.csv

dixonenv <- read_csv("../data/dixonenv.csv")
## Rows: 79 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): site, tsf, veg, aspect
## dbl (10): lcwd, shrcov, grcov, litcov, litdep, rocks, elev, warm, cold, twi
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(dixonenv, 10)
## # A tibble: 10 × 14
##    site  tsf    lcwd veg   shrcov grcov litcov litdep rocks aspect  elev  warm
##    <chr> <chr> <dbl> <chr>  <dbl> <dbl>  <dbl>  <dbl> <dbl> <chr>  <dbl> <dbl>
##  1 A01   long   9.72 SAW    0.077 0.65   0.35     9.9  4.68 N      1051.  25.0
##  2 A02   long  11.2  SAW    0.125 0.63   0.465   13.4  0    S       970.  25.6
##  3 A04   long  11.1  SAW    0.012 0.77   0.32    10.1  1.14 N      1073.  24.8
##  4 A05   long  10.8  SAW    0.059 0.345  0.636    9.2 18.9  N      1299.  22.6
##  5 A06   long  11.4  DS     0.084 0.685  0.585   23.9 24.3  W      1504.  20.9
##  6 A07   long  11.8  DS     0.144 0.9    0.665   22.1  7.66 E      1532.  20.6
##  7 A08   long  12.0  DS     0.04  0.725  0.41    10.8  0    W      1201.  23.6
##  8 A09   long  12.1  DS     0.079 0.74   0.465   17.1  2.86 E      1468.  21.2
##  9 A10   long  11.5  DS     0.057 0.575  0.635   21.2 26.1  S      1449.  21.3
## 10 B01   short  9.23 DS     0.31  0.175  0.375    8.2  1.37 W       739.  27.3
## # ℹ 2 more variables: cold <dbl>, twi <dbl>

They recorded Coarse Woody Debris, which was long-transformed, litter cover (litcov), % cover of groundcover (grcov), and % cover of shrubs (shrcov), and rock cover. In the original paper, Dixon et al. were comfortable that these predictors weren’t correlated.

MVABUND

The previous analyses showed differences in dispersion between tsf groups which may affect our permanova interpretations. An alternative is to use mvabund based on a poisson or negative binomial distribution.

B. Simple MDS & Permanova: Frogs and urbanization

Let’s return to the Hutto and Barrett (2021) example from the Chapter 15 exercises. There, we used analyses based on associations between variables to explore the relationship between frog assemblages (RDA) in ponds and the degree of urbanization surrounding. This seems a question that could just as easily be examined using distances or dissimilarities.

Return to the data and assess whether frog assemblages (using the standardized abundance scale or presence-absence) differ with urbanization.

Did your conclusions differ from those obtained using RDA?

To answer this question, you’ll need to exclude a couple of ponds that had no frogs.

df <- read_csv("../data/huttoamph.csv")
## Rows: 50 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): type
## dbl (13): Site, ACRE, BAME, BFOW, GCAR, HCIN, HVER, LCAT, LCLA, LPAL, LSPH, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df,10)
## # A tibble: 10 × 14
##     Site type   ACRE  BAME  BFOW  GCAR  HCIN  HVER  LCAT  LCLA  LPAL  LSPH  PCRU
##    <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1 Low       9     0     5     0     0     0     1     4     0     1    10
##  2     2 High      0     0     0     0     4     2     0     4     0     2     5
##  3     3 Low       0     0     0     3     9     6     2     2     0     1    10
##  4     4 High      0     0     0     0     0     0     8     0     0     2     1
##  5     5 Open…     0     0     0     0     0     0     0     1     0     0     0
##  6     6 Open…     1     0     2     0     0     0     4     0     0     1     0
##  7     7 Open…     3     0     5     0     3     0     1     2     0     1     1
##  8     8 Open…     3     0     0     0     1     0     1     2     0     1     1
##  9     9 Open…     0     0     0     0     0     0     0     0     0     0     0
## 10    10 Open…     0     0     0     0     0     0     3     6     0     2     1
## # ℹ 1 more variable: PFER <dbl>
#ponds 9 and 40 had no frogs. Remove for distance analysis
df <- df %>% 
  filter(Site!= 9 & Site !=40)
df
## # A tibble: 48 × 14
##     Site type   ACRE  BAME  BFOW  GCAR  HCIN  HVER  LCAT  LCLA  LPAL  LSPH  PCRU
##    <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1 Low       9     0     5     0     0     0     1     4     0     1    10
##  2     2 High      0     0     0     0     4     2     0     4     0     2     5
##  3     3 Low       0     0     0     3     9     6     2     2     0     1    10
##  4     4 High      0     0     0     0     0     0     8     0     0     2     1
##  5     5 Open…     0     0     0     0     0     0     0     1     0     0     0
##  6     6 Open…     1     0     2     0     0     0     4     0     0     1     0
##  7     7 Open…     3     0     5     0     3     0     1     2     0     1     1
##  8     8 Open…     3     0     0     0     1     0     1     2     0     1     1
##  9    10 Open…     0     0     0     0     0     0     3     6     0     2     1
## 10    11 Open…     0     1     8     0     0     0     5     1     0     2     0
## # ℹ 38 more rows
## # ℹ 1 more variable: PFER <dbl>

The explanation of the variables is in the previous chapter’s exercises.

C. Human oral microbiomes and HIV

Griffen et al. (2019) examined the oral microbiome of humans, with a focus on the effects of HIV and AntiRetroviral therapy (ART) on this microbiome. They described the microbiome of 341 patients, who fell into one of two categories: HIV-, and HIV+ with ART. They also matched their samples as far as possible for sex, along with other conditions that can influence oral microbiomes (Candida infection, current smoking). We’ll use their records for these other categories as well. There were more HIV+ than - subjects, but within these groups, approximately equal numbers of two sexes, two Candida infection status, and two smoking categories.

The bacteriome was recorded separately using 16S RNA sequencing, which covered just over 600 taxa.

The data are available from their paper, as two Excel files in the supplementary information. The metadata gives HIV status, and a raft of demographic information. We’ll just work with HIV, sex, Candida, and current smoking. A second sheet has the bacteriome. The code chunk below reads this file. It also screens the file for any bacterial taxa that were not present in this particular study, which reduces the bacterial taxa to 599.

library(readxl)
#Read in metadata. Lots of information we don't need, so a couple of iterations to get down to a few columns we want - HIV status, candida, current smoking, gender
df <- read_excel("../data/41598_2019_55703_MOESM3_ESM.xlsx", 
     range = "A1:S342", na = "NA")
df<-dplyr::select(df, HIV, candida, gender, smoking_current)
head(df)
## # A tibble: 6 × 4
##   HIV   candida gender smoking_current
##   <chr> <chr>   <chr>  <chr>          
## 1 Yes   No      Female No             
## 2 Yes   Yes     Male   No             
## 3 Yes   No      Female No             
## 4 Yes   No      Male   No             
## 5 Yes   No      Male   No             
## 6 Yes   No      Male   Yes
#df1 is the bacterial data
df1 <- read_excel("../data/41598_2019_55703_MOESM2_ESM.xlsx")
## New names:
## • `` -> `...1`
head(df1,10)
## # A tibble: 10 × 636
##      ...1 `Abiotrophia defectiva` Acetobacter pasteuria…¹ Achromobacter xyloso…²
##     <dbl>                   <dbl>                   <dbl>                  <dbl>
##  1 14102.                      14                       0                      0
##  2 14104.                      10                       0                      0
##  3 14108.                      38                       0                      0
##  4 14109.                      10                       0                      0
##  5 14112.                       0                       0                      0
##  6 14113.                       0                       0                      0
##  7 14115.                       0                       0                      0
##  8 14117.                       1                       0                      0
##  9 14121.                       0                       0                      0
## 10 14122.                       0                       0                      0
## # ℹ abbreviated names: ¹​`Acetobacter pasteurianus`,
## #   ²​`Achromobacter xylosoxidans`
## # ℹ 632 more variables: `Acidovorax cattleyae` <dbl>,
## #   `Acinetobacter baumanii` <dbl>, `Acinetobacter calcoaceticus` <dbl>,
## #   `Acinetobacter guillouiae` <dbl>, `Acinetobacter gyllenbergii` <dbl>,
## #   `Acinetobacter haemolyticus` <dbl>, `Acinetobacter JEL30` <dbl>,
## #   `Acinetobacter johnsonii` <dbl>, `Acinetobacter junii` <dbl>, …
df1 <- df1 %>%
  dplyr::select(where( ~ is.numeric(.x) && sum(.x) != 0))  #Drops any bacterial taxon not recorded in this study (i.e. where it's all zeroes)
df1 <- df1[,-1] #remove col1, which is sample ID

Use a dissimilarity based approach to assess the combined effects of HIV-ART, sex, Candida infection, and current smoking on the bacteriome

Do these factors act independently on the bacteriome?

Which effects are largest (use some graphical methods)?

First steps: Think about standardization and which distance measure you’ll use. Bray-Curtis seems common for these kind of data, and counts of different taxa vary widely.

Could you fit a simpler model to the data?

References

Dixon, Kelly M., Geoffrey J. Cary, Graeme L. Worboys, and Philip Gibbons. 2018. “The Disproportionate Importance of Long‐unburned Forests and Woodlands for Reptiles.” Ecology and Evolution 8 (22): 10952–63. https://doi.org/gfs587.
Griffen, Ann L., Zachary A. Thompson, Clifford J. Beall, Elizabeth A. Lilly, Carolina Granada, Kelly D. Treas, Kenneth R. DuBois, et al. 2019. “Significant Effect of HIV/HAART on Oral Microbiota Using Multivariate Analysis.” Scientific Reports 9 (1): 19946. https://doi.org/gsfpz3.
Hutto, David, and Kyle Barrett. 2021. “Do Urban Open Spaces Provide Refugia for Frogs in Urban Environments?” Plos One 16 (1). https://doi.org/gr2r9n.