As part of a phylogenetic study on the adaptation of cetaceans to an aquatic lifestyle, McGowen et al (2020) counted the number of genes in each cell of three cross-classified variables: sensory (e.g. sight, vision etc.) vs nonsensory genes, ingroup (whales, dolphins etc.) vs outgroup (e.g. non-aquatic like hippopotamuses) lineages from their phylogenetic tree, and whether the genes were under positive selection (PSG) or not. The research question was primarily about the association of whether genes were under positive selection or not with both phylogenetic group (ingroup vs outgroup) and sensory role (sensory vs non-sensory) and. We will focus just on the analysis of vision genes.

The paper is here

McGowen, M. R., Tsagkogeorga, G., Williamson, J., Morin, P. A. & Rossiter, A. S. J. (2020). Positive selection and inactivation in the vision and hearing genes of cetaceans. Mol Biol Evol, 37, 2069-83.

Preliminaries

First, load the required packages (epitools, vcd, DescTools)

Import mcgowen data file (mcgowen.csv) and select vision genes only

mcgowen <- read.csv("../data/mcgowen.csv")
mcgvision<-mcgowen[mcgowen$group %in% c("vision","control"),]
head(mcgvision,10)
, , genetype = nonsensory

        lineage
psg       in out
  nonpsg  21  23
  psg    146 144

, , genetype = sensory

        lineage
psg       in out
  nonpsg  30  19
  psg     66  77

Glm test for 3 way interaction

mcgvision1.glm <- glm(count~psg*lineage*genetype, data=mcgvision, family=poisson)
mcgvision2.glm <- glm(count~psg+lineage+genetype+psg*lineage+psg*genetype+lineage*genetype, data=mcgvision, family=poisson)
lrtest(mcgvision1.glm,mcgvision2.glm)
Likelihood ratio test

Model 1: count ~ psg * lineage * genetype
Model 2: count ~ psg + lineage + genetype + psg * lineage + psg * genetype + 
    lineage * genetype
  #Df LogLik Df Chisq Pr(>Chisq)
1   8  -22.9                    
2   7  -24.1 -1  2.36       0.12

No evidence for 3 way interaction - ORs for each 2x2 consistent across 3rd variable

Confirm with Breslow-Day test

BreslowDayTest(mcgvision.tab, correct = FALSE)

    Breslow-Day test on Homogeneity of Odds Ratios

data:  mcgvision.tab
X-squared = 2.35, df = 1, p-value = 0.13

Test each 2 way interaction to see if conditional independence applies

# First PSG*lineage (these two are independent)
mcgvision3.glm <- glm(count~psg+lineage+genetype+psg*genetype+lineage*genetype, data=mcgvision, family=poisson)
lrtest(mcgvision2.glm,mcgvision3.glm)
Likelihood ratio test

Model 1: count ~ psg + lineage + genetype + psg * lineage + psg * genetype + 
    lineage * genetype
Model 2: count ~ psg + lineage + genetype + psg * genetype + lineage * 
    genetype
  #Df LogLik Df Chisq Pr(>Chisq)
1   7  -24.1                    
2   6  -24.6 -1  1.09        0.3
# Second, PSG * genetype (these two are associated)
mcgvision4.glm <- glm(count~psg+lineage+genetype+psg*lineage+lineage*genetype, data=mcgvision, family=poisson)
lrtest(mcgvision2.glm,mcgvision4.glm)
Likelihood ratio test

Model 1: count ~ psg + lineage + genetype + psg * lineage + psg * genetype + 
    lineage * genetype
Model 2: count ~ psg + lineage + genetype + psg * lineage + lineage * 
    genetype
  #Df LogLik Df Chisq Pr(>Chisq)    
1   7  -24.1                        
2   6  -30.2 -1  12.4    0.00043 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
residuals(mcgvision4.glm,type=c("pearson"))
       9       10       11       12       13       14       15       16 
-1.29411 -0.40852  2.63848  0.93711  0.98118  0.30974 -2.00046 -0.71050 
# Third lineage*genetype (these two are independent)
mcgvision5.glm <- glm(count~psg+lineage+genetype+psg*lineage+psg*genetype, data=mcgvision, family=poisson)
lrtest(mcgvision2.glm,mcgvision5.glm)
Likelihood ratio test

Model 1: count ~ psg + lineage + genetype + psg * lineage + psg * genetype + 
    lineage * genetype
Model 2: count ~ psg + lineage + genetype + psg * lineage + psg * genetype
  #Df LogLik Df Chisq Pr(>Chisq)
1   7  -24.1                    
2   6  -24.1 -1  0.03       0.87
# Try complete independence model
mcgvision6.glm <- glm(count~psg+lineage+genetype, data=mcgvision, family=poisson)

compare model fits

AIC(mcgvision1.glm,mcgvision2.glm,mcgvision3.glm,mcgvision4.glm,mcgvision5.glm, mcgvision6.glm)

Estimate ORs for PSG x genetype for each lineage and overall

Overall

mcgvision_simp.tab <- xtabs(count~psg + genetype, data=mcgvision, drop.unused.levels=TRUE)
mcgvision_simp.tab
        genetype
psg      nonsensory sensory
  nonpsg         44      49
  psg           290     143
loddsratio(mcgvision_simp.tab, log = FALSE)
 odds ratios for psg and genetype 

[1] 0.44279

Separately

mcgvision1 <- subset(mcgvision,subset = lineage %in% 'in')
mcgvision1
mcgvision1.tab <- xtabs(count~psg + genetype, data=mcgvision1, drop.unused.levels=TRUE)
loddsratio(mcgvision1.tab, log = FALSE)
 odds ratios for psg and genetype 

[1] 0.31644
mcgvision2 <- subset(mcgvision,subset = lineage %in% 'out')
mcgvision2
mcgvision2.tab <- xtabs(count~psg + genetype, data=mcgvision2, drop.unused.levels=TRUE)
loddsratio(mcgvision2.tab, log = FALSE)
 odds ratios for psg and genetype 

[1] 0.6473
LS0tCnRpdGxlOiAiUUsgQm94IDEzLjciCgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFKQpgYGAKCkFzIHBhcnQgb2YgYSBwaHlsb2dlbmV0aWMgc3R1ZHkgb24gdGhlIGFkYXB0YXRpb24gb2YgY2V0YWNlYW5zIHRvIGFuIGFxdWF0aWMgbGlmZXN0eWxlLCBNY0dvd2VuIGV0IGFsICgyMDIwKSBjb3VudGVkIHRoZSBudW1iZXIgb2YgZ2VuZXMgaW4gZWFjaCBjZWxsIG9mIHRocmVlIGNyb3NzLWNsYXNzaWZpZWQgdmFyaWFibGVzOiBzZW5zb3J5IChlLmcuIHNpZ2h0LCB2aXNpb24gZXRjLikgdnMgbm9uc2Vuc29yeSBnZW5lcywgaW5ncm91cCAod2hhbGVzLCBkb2xwaGlucyBldGMuKSB2cyBvdXRncm91cCAoZS5nLiBub24tYXF1YXRpYyBsaWtlIGhpcHBvcG90YW11c2VzKSBsaW5lYWdlcyBmcm9tIHRoZWlyIHBoeWxvZ2VuZXRpYyB0cmVlLCBhbmQgd2hldGhlciB0aGUgZ2VuZXMgd2VyZSB1bmRlciBwb3NpdGl2ZSBzZWxlY3Rpb24gKFBTRykgb3Igbm90LiBUaGUgcmVzZWFyY2ggcXVlc3Rpb24gd2FzIHByaW1hcmlseSBhYm91dCB0aGUgYXNzb2NpYXRpb24gb2Ygd2hldGhlciBnZW5lcyB3ZXJlIHVuZGVyIHBvc2l0aXZlIHNlbGVjdGlvbiBvciBub3Qgd2l0aCBib3RoIHBoeWxvZ2VuZXRpYyBncm91cCAoaW5ncm91cCB2cyBvdXRncm91cCkgYW5kIHNlbnNvcnkgcm9sZSAoc2Vuc29yeSB2cyBub24tc2Vuc29yeSkgYW5kLiBXZSB3aWxsIGZvY3VzIGp1c3Qgb24gdGhlIGFuYWx5c2lzIG9mIHZpc2lvbiBnZW5lcy4KClRoZSBwYXBlciBpcyBbaGVyZV0oaHR0cHM6Ly9kb2kub3JnLzEwLjEwOTMvbW9sYmV2L21zYWEwNzApCgpNY0dvd2VuLCBNLiBSLiwgVHNhZ2tvZ2VvcmdhLCBHLiwgV2lsbGlhbXNvbiwgSi4sIE1vcmluLCBQLiBBLiAmIFJvc3NpdGVyLCBBLiBTLiBKLiAoMjAyMCkuIFBvc2l0aXZlIHNlbGVjdGlvbiBhbmQgaW5hY3RpdmF0aW9uIGluIHRoZSB2aXNpb24gYW5kIGhlYXJpbmcgZ2VuZXMgb2YgY2V0YWNlYW5zLiAqTW9sIEJpb2wgRXZvbCosIDM3LCAyMDY5LTgzLgoKIyMjIFByZWxpbWluYXJpZXMKCkZpcnN0LCBsb2FkIHRoZSByZXF1aXJlZCBwYWNrYWdlcyAoZXBpdG9vbHMsIHZjZCwgRGVzY1Rvb2xzKQoKYGBge3IgaW5jbHVkZT1GQUxTRSwgcmVzdWx0cz0naGlkZSd9CmxpYnJhcnkoZXBpdG9vbHMpCmxpYnJhcnkodmNkKQpsaWJyYXJ5KERlc2NUb29scykKYGBgCgpJbXBvcnQgbWNnb3dlbiBkYXRhIGZpbGUgKFttY2dvd2VuLmNzdl0oLi4vZGF0YS9tY2dvd2VuLmNzdikpIGFuZCBzZWxlY3QgdmlzaW9uIGdlbmVzIG9ubHkKCmBgYHtyfQptY2dvd2VuIDwtIHJlYWQuY3N2KCIuLi9kYXRhL21jZ293ZW4uY3N2IikKbWNndmlzaW9uPC1tY2dvd2VuW21jZ293ZW4kZ3JvdXAgJWluJSBjKCJ2aXNpb24iLCJjb250cm9sIiksXQpoZWFkKG1jZ3Zpc2lvbiwxMCkKYGBgCgpgYGB7ciBlY2hvPUZBTFNFfQptY2d2aXNpb24udGFiPC14dGFicyhjb3VudH5wc2cgKyBsaW5lYWdlICsgZ2VuZXR5cGUsIGRhdGE9bWNndmlzaW9uKQptY2d2aXNpb24udGFiCmBgYAoKIyMjIEdsbSB0ZXN0IGZvciAzIHdheSBpbnRlcmFjdGlvbgoKYGBge3J9Cm1jZ3Zpc2lvbjEuZ2xtIDwtIGdsbShjb3VudH5wc2cqbGluZWFnZSpnZW5ldHlwZSwgZGF0YT1tY2d2aXNpb24sIGZhbWlseT1wb2lzc29uKQptY2d2aXNpb24yLmdsbSA8LSBnbG0oY291bnR+cHNnK2xpbmVhZ2UrZ2VuZXR5cGUrcHNnKmxpbmVhZ2UrcHNnKmdlbmV0eXBlK2xpbmVhZ2UqZ2VuZXR5cGUsIGRhdGE9bWNndmlzaW9uLCBmYW1pbHk9cG9pc3NvbikKbHJ0ZXN0KG1jZ3Zpc2lvbjEuZ2xtLG1jZ3Zpc2lvbjIuZ2xtKQpgYGAKCk5vIGV2aWRlbmNlIGZvciAzIHdheSBpbnRlcmFjdGlvbiAtIE9ScyBmb3IgZWFjaCAyeDIgY29uc2lzdGVudCBhY3Jvc3MgM3JkIHZhcmlhYmxlCgojIyMjIENvbmZpcm0gd2l0aCBCcmVzbG93LURheSB0ZXN0CgpgYGB7cn0KQnJlc2xvd0RheVRlc3QobWNndmlzaW9uLnRhYiwgY29ycmVjdCA9IEZBTFNFKQpgYGAKCiMjIyBUZXN0IGVhY2ggMiB3YXkgaW50ZXJhY3Rpb24gdG8gc2VlIGlmIGNvbmRpdGlvbmFsIGluZGVwZW5kZW5jZSBhcHBsaWVzCgpgYGB7cn0KIyBGaXJzdCBQU0cqbGluZWFnZSAodGhlc2UgdHdvIGFyZSBpbmRlcGVuZGVudCkKbWNndmlzaW9uMy5nbG0gPC0gZ2xtKGNvdW50fnBzZytsaW5lYWdlK2dlbmV0eXBlK3BzZypnZW5ldHlwZStsaW5lYWdlKmdlbmV0eXBlLCBkYXRhPW1jZ3Zpc2lvbiwgZmFtaWx5PXBvaXNzb24pCmxydGVzdChtY2d2aXNpb24yLmdsbSxtY2d2aXNpb24zLmdsbSkKIyBTZWNvbmQsIFBTRyAqIGdlbmV0eXBlICh0aGVzZSB0d28gYXJlIGFzc29jaWF0ZWQpCm1jZ3Zpc2lvbjQuZ2xtIDwtIGdsbShjb3VudH5wc2crbGluZWFnZStnZW5ldHlwZStwc2cqbGluZWFnZStsaW5lYWdlKmdlbmV0eXBlLCBkYXRhPW1jZ3Zpc2lvbiwgZmFtaWx5PXBvaXNzb24pCmxydGVzdChtY2d2aXNpb24yLmdsbSxtY2d2aXNpb240LmdsbSkKcmVzaWR1YWxzKG1jZ3Zpc2lvbjQuZ2xtLHR5cGU9YygicGVhcnNvbiIpKQojIFRoaXJkIGxpbmVhZ2UqZ2VuZXR5cGUgKHRoZXNlIHR3byBhcmUgaW5kZXBlbmRlbnQpCm1jZ3Zpc2lvbjUuZ2xtIDwtIGdsbShjb3VudH5wc2crbGluZWFnZStnZW5ldHlwZStwc2cqbGluZWFnZStwc2cqZ2VuZXR5cGUsIGRhdGE9bWNndmlzaW9uLCBmYW1pbHk9cG9pc3NvbikKbHJ0ZXN0KG1jZ3Zpc2lvbjIuZ2xtLG1jZ3Zpc2lvbjUuZ2xtKQojIFRyeSBjb21wbGV0ZSBpbmRlcGVuZGVuY2UgbW9kZWwKbWNndmlzaW9uNi5nbG0gPC0gZ2xtKGNvdW50fnBzZytsaW5lYWdlK2dlbmV0eXBlLCBkYXRhPW1jZ3Zpc2lvbiwgZmFtaWx5PXBvaXNzb24pCmBgYAoKIyMjIGNvbXBhcmUgbW9kZWwgZml0cwoKYGBge3J9CkFJQyhtY2d2aXNpb24xLmdsbSxtY2d2aXNpb24yLmdsbSxtY2d2aXNpb24zLmdsbSxtY2d2aXNpb240LmdsbSxtY2d2aXNpb241LmdsbSwgbWNndmlzaW9uNi5nbG0pCmBgYAoKIyMjIEVzdGltYXRlIE9ScyBmb3IgUFNHIHggZ2VuZXR5cGUgZm9yIGVhY2ggbGluZWFnZSBhbmQgb3ZlcmFsbAoKT3ZlcmFsbAoKYGBge3J9Cm1jZ3Zpc2lvbl9zaW1wLnRhYiA8LSB4dGFicyhjb3VudH5wc2cgKyBnZW5ldHlwZSwgZGF0YT1tY2d2aXNpb24sIGRyb3AudW51c2VkLmxldmVscz1UUlVFKQptY2d2aXNpb25fc2ltcC50YWIKbG9kZHNyYXRpbyhtY2d2aXNpb25fc2ltcC50YWIsIGxvZyA9IEZBTFNFKQoKYGBgCgpTZXBhcmF0ZWx5CgpgYGB7cn0KbWNndmlzaW9uMSA8LSBzdWJzZXQobWNndmlzaW9uLHN1YnNldCA9IGxpbmVhZ2UgJWluJSAnaW4nKQptY2d2aXNpb24xCm1jZ3Zpc2lvbjEudGFiIDwtIHh0YWJzKGNvdW50fnBzZyArIGdlbmV0eXBlLCBkYXRhPW1jZ3Zpc2lvbjEsIGRyb3AudW51c2VkLmxldmVscz1UUlVFKQpsb2Rkc3JhdGlvKG1jZ3Zpc2lvbjEudGFiLCBsb2cgPSBGQUxTRSkKbWNndmlzaW9uMiA8LSBzdWJzZXQobWNndmlzaW9uLHN1YnNldCA9IGxpbmVhZ2UgJWluJSAnb3V0JykKbWNndmlzaW9uMgptY2d2aXNpb24yLnRhYiA8LSB4dGFicyhjb3VudH5wc2cgKyBnZW5ldHlwZSwgZGF0YT1tY2d2aXNpb24yLCBkcm9wLnVudXNlZC5sZXZlbHM9VFJVRSkKbG9kZHNyYXRpbyhtY2d2aXNpb24yLnRhYiwgbG9nID0gRkFMU0UpCmBgYAo=