As part of a phylogenetic study on the adaptation of cetaceans to an aquatic lifestyle, McGowen et al (2020) counted the number of genes in each cell of three cross-classified variables: sensory (e.g. sight, vision etc.) vs nonsensory genes, ingroup (whales, dolphins etc.) vs outgroup (e.g. non-aquatic like hippopotamuses) lineages from their phylogenetic tree, and whether the genes were under positive selection (PSG) or not. The research question was primarily about the association of whether genes were under positive selection or not with both phylogenetic group (ingroup vs outgroup) and sensory role (sensory vs non-sensory) and. We will focus just on the analysis of vision genes.
The paper is here
McGowen, M. R., Tsagkogeorga, G., Williamson, J., Morin, P. A. & Rossiter, A. S. J. (2020). Positive selection and inactivation in the vision and hearing genes of cetaceans. Mol Biol Evol, 37, 2069-83.
First, load the required packages (epitools, vcd, DescTools)
Import mcgowen data file (mcgowen.csv) and select vision genes only
mcgowen <- read.csv("../data/mcgowen.csv")
mcgvision<-mcgowen[mcgowen$group %in% c("vision","control"),]
head(mcgvision,10)
, , genetype = nonsensory
lineage
psg in out
nonpsg 21 23
psg 146 144
, , genetype = sensory
lineage
psg in out
nonpsg 30 19
psg 66 77
mcgvision1.glm <- glm(count~psg*lineage*genetype, data=mcgvision, family=poisson)
mcgvision2.glm <- glm(count~psg+lineage+genetype+psg*lineage+psg*genetype+lineage*genetype, data=mcgvision, family=poisson)
lrtest(mcgvision1.glm,mcgvision2.glm)
Likelihood ratio test
Model 1: count ~ psg * lineage * genetype
Model 2: count ~ psg + lineage + genetype + psg * lineage + psg * genetype +
lineage * genetype
#Df LogLik Df Chisq Pr(>Chisq)
1 8 -22.9
2 7 -24.1 -1 2.36 0.12
No evidence for 3 way interaction - ORs for each 2x2 consistent across 3rd variable
BreslowDayTest(mcgvision.tab, correct = FALSE)
Breslow-Day test on Homogeneity of Odds Ratios
data: mcgvision.tab
X-squared = 2.35, df = 1, p-value = 0.13
# First PSG*lineage (these two are independent)
mcgvision3.glm <- glm(count~psg+lineage+genetype+psg*genetype+lineage*genetype, data=mcgvision, family=poisson)
lrtest(mcgvision2.glm,mcgvision3.glm)
Likelihood ratio test
Model 1: count ~ psg + lineage + genetype + psg * lineage + psg * genetype +
lineage * genetype
Model 2: count ~ psg + lineage + genetype + psg * genetype + lineage *
genetype
#Df LogLik Df Chisq Pr(>Chisq)
1 7 -24.1
2 6 -24.6 -1 1.09 0.3
# Second, PSG * genetype (these two are associated)
mcgvision4.glm <- glm(count~psg+lineage+genetype+psg*lineage+lineage*genetype, data=mcgvision, family=poisson)
lrtest(mcgvision2.glm,mcgvision4.glm)
Likelihood ratio test
Model 1: count ~ psg + lineage + genetype + psg * lineage + psg * genetype +
lineage * genetype
Model 2: count ~ psg + lineage + genetype + psg * lineage + lineage *
genetype
#Df LogLik Df Chisq Pr(>Chisq)
1 7 -24.1
2 6 -30.2 -1 12.4 0.00043 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
residuals(mcgvision4.glm,type=c("pearson"))
9 10 11 12 13 14 15 16
-1.29411 -0.40852 2.63848 0.93711 0.98118 0.30974 -2.00046 -0.71050
# Third lineage*genetype (these two are independent)
mcgvision5.glm <- glm(count~psg+lineage+genetype+psg*lineage+psg*genetype, data=mcgvision, family=poisson)
lrtest(mcgvision2.glm,mcgvision5.glm)
Likelihood ratio test
Model 1: count ~ psg + lineage + genetype + psg * lineage + psg * genetype +
lineage * genetype
Model 2: count ~ psg + lineage + genetype + psg * lineage + psg * genetype
#Df LogLik Df Chisq Pr(>Chisq)
1 7 -24.1
2 6 -24.1 -1 0.03 0.87
# Try complete independence model
mcgvision6.glm <- glm(count~psg+lineage+genetype, data=mcgvision, family=poisson)
AIC(mcgvision1.glm,mcgvision2.glm,mcgvision3.glm,mcgvision4.glm,mcgvision5.glm, mcgvision6.glm)
Overall
mcgvision_simp.tab <- xtabs(count~psg + genetype, data=mcgvision, drop.unused.levels=TRUE)
mcgvision_simp.tab
genetype
psg nonsensory sensory
nonpsg 44 49
psg 290 143
loddsratio(mcgvision_simp.tab, log = FALSE)
odds ratios for psg and genetype
[1] 0.44279
Separately
mcgvision1 <- subset(mcgvision,subset = lineage %in% 'in')
mcgvision1
mcgvision1.tab <- xtabs(count~psg + genetype, data=mcgvision1, drop.unused.levels=TRUE)
loddsratio(mcgvision1.tab, log = FALSE)
odds ratios for psg and genetype
[1] 0.31644
mcgvision2 <- subset(mcgvision,subset = lineage %in% 'out')
mcgvision2
mcgvision2.tab <- xtabs(count~psg + genetype, data=mcgvision2, drop.unused.levels=TRUE)
loddsratio(mcgvision2.tab, log = FALSE)
odds ratios for psg and genetype
[1] 0.6473