Newman et al. (2015) compared flower characteristics of two different phenotypes (short style and long style) of the geophytic plant Nerine humilis in South Africa. They had multiple populations for each phenotype (seven or eight short style and three long style, depending on response variable) and from each population, they measured flower characteristics (e.g. tepal length, nectar volume) from between 12 and 37 flowers per population. Phenotype (two groups) was an observational fixed factor, with population a random factor nested within phenotype; we will analyze tepal length as the response variable.
We show several ways to approach this data set, ranging from the “traditional” OLS nested analysis to simplified analysis using means, to linear mixed models and REML/ML. At this stage, it’s useful to show you the ways you might encounter these analyses, but in practice you’d use one.
Nerine humilis. Sam McCarren , via iNaturalist. There’s also a good picture of the different phenotypes as Figure 1 of Newman et al.
The paper is here, and the data are at dryad, as an Excel file floral_traits nested ANOVA and we’ve used the tepal length sheet for this example.
Newman, E., Manning, J. & Anderson, B. (2015). Local adaptation: Mechanical fit between floral ecotypes of Nerine humilis (Amaryllidaceae) and pollinator communities. Evolution, 69, 2262-75.
First, load the required packages (afex, car, lattice, lme4, lmerTest, nlme, VCA, Rmisc)
Import newman data file (newman_tepal.csv)
newman_tepal <- read.csv("../data/newman_tepal.csv")
head(newman_tepal,10)
Make population a factor
newman_tepal$population <- factor(newman_tepal$population)
Calculate summary stats by groups
newman_sum <- summarySE(newman_tepal,measurevar='tepal', groupvars= c('population','phenotype'))
newman_sum
do preliminary checks (look OK)
boxplot(tepal~phenotype, data=newman_tepal)
boxplot(tepal~phenotype, data=newman_sum)
Note: Defaults to Type I SS in R - can’t change it with nested aov structures. Need to get Type III from another package (e.g. spss) - very similar to Type I and II anyway
newman.aov <- aov(tepal~phenotype+Error(population), newman_tepal)
Plot residuals based on colonies
plot(resid(newman.aov[[2]])~fitted(newman.aov[[2]]))
options(digits=3)
summary(newman.aov)
Error: population
Df Sum Sq Mean Sq F value Pr(>F)
phenotype 1 2363 2363 11.6 0.0094 **
Residuals 8 1636 205
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 236 4364 18.5
newman.lm1 <- lm(tepal~phenotype+phenotype/population, newman_tepal)
summary(newman.lm1)
Call:
lm(formula = tepal ~ phenotype + phenotype/population, data = newman_tepal)
Residuals:
Min 1Q Median 3Q Max
-8.852 -2.965 -0.222 2.864 11.579
Coefficients: (10 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.22093 0.53414 50.96 < 2e-16 ***
phenotype1 2.27907 0.53414 4.27 2.9e-05 ***
phenotypeLong:population1 -0.00227 2.10354 0.00 0.99914
phenotypeShort:population1 -4.13836 1.16386 -3.56 0.00046 ***
phenotypeLong:population2 NA NA NA NA
phenotypeShort:population2 -6.07281 1.14479 -5.30 2.6e-07 ***
phenotypeLong:population3 NA NA NA NA
phenotypeShort:population3 -4.29371 1.05587 -4.07 6.5e-05 ***
phenotypeLong:population4 NA NA NA NA
phenotypeShort:population4 -3.33686 1.16386 -2.87 0.00452 **
phenotypeLong:population5 NA NA NA NA
phenotypeShort:population5 -3.48534 1.11084 -3.14 0.00192 **
phenotypeLong:population6 NA NA NA NA
phenotypeShort:population6 4.47919 1.18458 3.78 0.00020 ***
phenotypeLong:population7 NA NA NA NA
phenotypeShort:population7 NA NA NA NA
phenotypeLong:population8 0.42227 1.24567 0.34 0.73492
phenotypeShort:population8 NA NA NA NA
phenotypeLong:population9 NA NA NA NA
phenotypeShort:population9 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.3 on 236 degrees of freedom
Multiple R-squared: 0.478, Adjusted R-squared: 0.458
F-statistic: 24 on 9 and 236 DF, p-value: <2e-16
anova(newman.lm1)
Analysis of Variance Table
Response: tepal
Df Sum Sq Mean Sq F value Pr(>F)
phenotype 1 2363 2363 127.8 <2e-16 ***
phenotype:population 8 1636 205 11.1 3e-13 ***
Residuals 236 4364 18
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Anova(newman.lm1, type="II")
Note: model has aliased coefficients
sums of squares computed by model comparison
Anova Table (Type II tests)
Response: tepal
Sum Sq Df F value Pr(>F)
phenotype 2363 1 127.8 <2e-16 ***
phenotype:population 1636 8 11.1 3e-13 ***
Residuals 4364 236
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
1-pf(11.060, 8, 236, lower.tail = TRUE, log.p = FALSE)
[1] 3.02e-13
This output is treating populations as a fixed effect, so to show comparability with the initial nested effect, we need to recalculate the test for phenotype, using populations as the denominator.
f <- 127.8/11.1
f
[1] 11.5
1-pf(f, 1, 8, lower.tail = TRUE, log.p = FALSE)
[1] 0.00946
newman.aov2 <- aov(tepal~phenotype, newman_sum)
summary(newman.aov2)
Df Sum Sq Mean Sq F value Pr(>F)
phenotype 1 101.9 101.9 10.8 0.011 *
Residuals 8 75.6 9.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Analysis using means won’t be exactly the same because sample sizes vary among populations.
newman.vca <- anovaMM(tepal~phenotype/(population), newman_tepal)
Convert variable phenotype from "character" to "factor"!
newman.vca
ANOVA-Type Estimation of Mixed Model:
--------------------------------------
[Fixed Effects]
int phenotypeLong phenotypeShort
22.54 6.96 0.00
[Variance Components]
Name DF SS MS VC %Total SD CV[%]
1 total 67.087549 26.148709 100 5.113581 20.692506
2 phenotype:population 8 1636.057375 204.507172 7.657816 29.285635 2.767276 11.197999
3 error 236 4363.850882 18.490894 18.490894 70.714365 4.300104 17.400708
Mean: 24.7 (N = 246)
Experimental Design: unbalanced | Method: ANOVA
VCAinference(newman.vca, alpha=0.05, VarVC=TRUE, ci.method="satterthwaite")
Inference from Mixed Model Fit
------------------------------
> VCA Result:
-------------
[Fixed Effects]
int phenotypeLong phenotypeShort
22.54 6.96 0.00
[Variance Components]
Name DF SS MS VC %Total SD CV[%] Var(VC)
1 total 67.0875 26.1487 100 5.1136 20.6925
2 phenotype:population 8 1636.0574 204.5072 7.6578 29.2856 2.7673 11.198 18.5087
3 error 236 4363.8509 18.4909 18.4909 70.7144 4.3001 17.4007 2.8976
Mean: 24.7 (N = 246)
Experimental Design: unbalanced | Method: ANOVA
> VC:
-----
Estimate DF CI LCL CI UCL One-Sided LCL One-Sided UCL
total 26.15 67.09 19.15 37.9 20.12 35.6
phenotype:population 7.66 6.62 3.29 33.5 3.75 25.9
error 18.49 236.00 15.56 22.3 15.99 21.7
> SD:
-----
Estimate DF CI LCL CI UCL One-Sided LCL One-Sided UCL
total 5.11 67.09 4.38 6.15 4.49 5.97
phenotype:population 2.77 6.62 1.81 5.79 1.94 5.09
error 4.30 236.00 3.94 4.73 4.00 4.65
> CV[%]:
--------
Estimate DF CI LCL CI UCL One-Sided LCL One-Sided UCL
total 20.7 67.09 17.71 24.9 18.15 24.2
phenotype:population 11.2 6.62 7.34 23.4 7.84 20.6
error 17.4 236.00 15.96 19.1 16.18 18.8
95% Confidence Level
Satterthwaite methodology used for computing CIs
newman.reml <- lmer(tepal~phenotype + (1|population), newman_tepal)
summary(newman.reml)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: tepal ~ phenotype + (1 | population)
Data: newman_tepal
REML criterion at convergence: 1435
Scaled residuals:
Min 1Q Median 3Q Max
-2.0493 -0.6346 -0.0382 0.6484 2.8575
Random effects:
Groups Name Variance Std.Dev.
population (Intercept) 8.45 2.91
Residual 18.50 4.30
Number of obs: 246, groups: population, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 26.02 1.05 7.80 24.82 1e-08 ***
phenotype1 3.48 1.05 7.80 3.32 0.011 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
phenotype1 0.398
newman.ml <- lmer(tepal~phenotype + (1|population), newman_tepal, REML=F)
summary(newman.ml)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: tepal ~ phenotype + (1 | population)
Data: newman_tepal
AIC BIC logLik deviance df.resid
1446 1460 -719 1438 242
Scaled residuals:
Min 1Q Median 3Q Max
-2.0468 -0.6475 -0.0484 0.6600 2.8986
Random effects:
Groups Name Variance Std.Dev.
population (Intercept) 6.56 2.56
Residual 18.50 4.30
Number of obs: 246, groups: population, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 26.019 0.935 9.692 27.84 1.4e-10 ***
phenotype1 3.480 0.935 9.692 3.72 0.0042 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
phenotype1 0.398
CI on variance components (remembering to square CIs from lmer with are in SD units)
newman.ci <- confint.merMod(newman.reml)
Computing profile confidence intervals ...
newman.vc <- (newman.ci)^2
print(newman.vc)
2.5 % 97.5 %
.sig01 2.58 19.9
.sigma 15.53 22.3
(Intercept) 575.54 786.6
phenotype1 2.11 30.4
Test fixed effect using likelihood ratio tests
newman.lme <- lmer(tepal~phenotype + (1|population), newman_tepal, REML=F)
newman.lme1 <- lmer(tepal~(1|population), newman_tepal, REML=F)
anova(newman.lme,newman.lme1)
Data: newman_tepal
Models:
newman.lme1: tepal ~ (1 | population)
newman.lme: tepal ~ phenotype + (1 | population)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
newman.lme1 3 1453 1464 -724 1447
newman.lme 4 1446 1460 -719 1438 8.65 1 0.0033 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(newman.reml)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: tepal ~ phenotype + (1 | population)
Data: newman_tepal
REML criterion at convergence: 1435
Scaled residuals:
Min 1Q Median 3Q Max
-2.0493 -0.6346 -0.0382 0.6484 2.8575
Random effects:
Groups Name Variance Std.Dev.
population (Intercept) 8.45 2.91
Residual 18.50 4.30
Number of obs: 246, groups: population, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 26.02 1.05 7.80 24.82 1e-08 ***
phenotype1 3.48 1.05 7.80 3.32 0.011 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
phenotype1 0.398
Get more reliable F-test than original anova for fixed effect based on Satterthwaite’s (default) and Kenward-Roger’s adjustment to df
anova(newman.reml)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
phenotype 204 204 1 7.8 11 0.011 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
anova(newman.reml, ddf="Kenward-Roger")
Type III Analysis of Variance Table with Kenward-Roger's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
phenotype 204 204 1 7.97 11 0.011 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1