Loyn (1987) selected 56 forest patches in southeastern Victoria, Australia, and related the abundance of forest birds in each patch to six predictor variables: patch area (ha), distance to nearest patch (km), distance to nearest larger patch (km), grazing stock (1 to 5 indicating light to heavy), altitude (m) and years since isolation (years). The aim was to develop a predictive model relating bird abundance to these predictors and to assess which predictors were most important for bird abundance.
Tawny Frogmouth, Podargus strigoides. Mick Keough, CC BY 4.0
Eastern Yellow Robin, Eopsaltria australis. Emmet Keough, CC BY 4.0
This data set was used in the first edition and is available here
First, load the required packages (car, lm.beta, hier.part)
Import loyn data file (loyn.csv)
loyn <- read.csv("../data/loyn.csv")
head(loyn,10)
NA
We will use the scatterplotMatrix function from the car package
scatterplotMatrix(~abund+area+dist+graze+alt+yearisol, data=loyn, cex=0.33, regLine=FALSE, diagonal=list(method='boxplot'))
abund looks symmetrical but two unusual observations for area resulting in non-linear relationships with abund and one outlier for distance to nearest patch
scatterplotMatrix(~abund+log10(area)+log10(dist)+graze+alt+yearisol, data=loyn, cex=0.33, regLine=FALSE, diagonal=list(method='boxplot'))
Plots look much better. Now transform the variables in loyn
loyn$logarea <- log10(loyn$area)
loyn$logdist <- log10(loyn$dist)
Use correlations and VIF
scatterplotMatrix(~logarea+logdist+graze+alt+yearisol, data=loyn, cex=0.33, regLine=FALSE, diagonal=list(method='boxplot'))
cor(loyn[,c('logarea','logdist','graze','alt','yearisol')])
logarea logdist graze alt yearisol
logarea 1.000 0.3022 -0.559 0.275 0.2784
logdist 0.302 1.0000 -0.143 -0.219 -0.0196
graze -0.559 -0.1426 1.000 -0.407 -0.6356
alt 0.275 -0.2190 -0.407 1.000 0.2327
yearisol 0.278 -0.0196 -0.636 0.233 1.0000
vif(lm(abund~ logarea+logdist+graze+alt+yearisol, data=loyn))
logarea logdist graze alt yearisol
1.62 1.27 2.52 1.36 1.74
Collinearity OK so now fit linear model relating response to predictor
loyn.lm <- lm(abund~ logarea+logdist+graze+alt+yearisol, data=loyn)
plot(loyn.lm)
augment(loyn.lm)
Get parameter estimates, their confidence intervals, and tests
tidy(loyn.lm, conf.int=TRUE)
Use lm.beta function from lm.beta package
lm.beta.loyn <- lm.beta(loyn.lm)
summary(loyn.lm)
Call:
lm(formula = abund ~ logarea + logdist + graze + alt + yearisol,
data = loyn)
Residuals:
Min 1Q Median 3Q Max
-15.344 -3.055 0.575 2.390 15.235
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -131.8474 88.6398 -1.49 0.143
logarea 7.2953 1.3360 5.46 1.5e-06 ***
logdist -1.3032 2.3189 -0.56 0.577
graze -1.6757 0.9211 -1.82 0.075 .
alt 0.0214 0.0229 0.94 0.353
yearisol 0.0766 0.0439 1.74 0.087 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.33 on 50 degrees of freedom
Multiple R-squared: 0.684, Adjusted R-squared: 0.653
F-statistic: 21.7 on 5 and 50 DF, p-value: 1.78e-11
tidy(lm.beta.loyn, conf.int=TRUE)
Show the partial regression (added-variable) plots
avPlots(loyn.lm, ask=F)
We need two dataframes that are subsets of loyn containing the response variable and the predictors
loyn_abund<-loyn$abund
loyn_pred<-subset(loyn, select = c("logarea","logdist","alt","yearisol","graze"))
hier.part(loyn_abund, loyn_pred, family="gaussian", gof="Rsqu")
$gfs
[1] 0.0000 0.5477 0.0161 0.1489 0.2534 0.4658 0.5580 0.5836 0.6435 0.6527 0.1957 0.2720 0.4667 0.3297 0.4798 0.4739 0.5853 0.6480 0.6610 0.6629 0.6611 0.6732 0.3707 0.4846 0.4758 0.4887 0.6634 0.6651 0.6788 0.6823 0.4961
[32] 0.6843
$IJ
$I.perc
$params
$params$full.model
[1] "y ~ logarea + logdist + alt + yearisol + graze"
$params$family
[1] "gaussian"
$params$link
[1] "default"
$params$gof
[1] "Rsqu"