Hawaiian amakihi point transect data.
In this problem, we illustrate fitting multiple covariate distance
sampling (MCDS) models to point transect data using a bird survey from
Hawaii: data on an abundant species, the Hawaii amakihi (Hemignathus
virens) is used. This practical is makes use of the
Distance
R package described by Miller et al. (2019)
duplicating the analysis in Marques et al. (2007). For basic information regarding
analysis of point transect data, consult the point
transect example
Region.Label Area Sample.Label Effort object distance Month OBs
1 792 0 1 1 1 40 0 TJS
2 792 0 1 1 2 60 0 TJS
3 792 0 1 1 3 45 0 TJS
Sp MAS HAS Study.Area
1 COAM 50 1 Kana
2 COAM 50 1 Kana
3 COAM 50 1 Kana
These data include:
Region.Label
- survey dates (month and year, e.g. 792
is July 1992) which are used as ‘strata’Area
- study area size (not used, set to 0) will only
produce density estimates, not abundanceSample.Label
- point transect identifier (41
transects)Effort
- survey effort (1 for all points because each
point was visited once)distance
- radial distance of detection from observer
(meters)month
-OBs
- initials of the observerSp
- species code (COAM)MAS
- minutes after sunriseHAS
- hour after sunriseStudy.Area
- name of the study area (Kana)Note that the Area
column is always zero, hence,
detection functions can be fitted to the data, but bird abundance cannot
be estimated. The covariates to be considered for possible inclusion
into the detection function are OBs
, MAS
and
HAS
.
It is important to gain an understanding of the data prior to fitting detection functions. With this in mind, preliminary analysis of distance sampling data involves:
We begin by assessing the distribution of distances to decide on a truncation distance (Figure 1).
hist(amakihi$distance, main="Radial distances", xlab="Distance (m)")
To see if there are differences in the distribution of distances
recorded by the different observers and in each hour after sunrise,
boxplots can be used. Note how the ~
symbol is used to
define the discrete groupings (i.e. observer and hour) (Figure
2).
boxplot(amakihi$distance~amakihi$OBs, xlab="Observer", ylab="Distance (m)")
boxplot(amakihi$distance~amakihi$HAS, xlab="Hour", ylab="Distance (m)")
The components of the boxplot are:
For minutes after sunrise (a continuous variable), we create a
scatterplot of MAS (on the \(x\)-axis)
against distances (on the \(y\)-axis).
The plotting symbol (or character) is selected with the argument
pch
(Figure 3)
scatter.smooth(amakihi$MAS, amakihi$distance, family = "gaussian", pch='.', lpars=list(lwd=3),
xlab="Minutes after sunrise",ylab="Distance (m)")
Clearly room for right truncation from this figure of the radial
distance distribution. Subsequent detection function fitting will use
the truncation
argument in ds()
to exclude the
largest 15% of the detection distances.
You may also want to think about potential collinearity (linear
relationship) between the covariates - if collinear variables are
included in the detection function, they will be explaining some of the
same variation in the distances and this will reduce their importance as
a potential covariate. How might you investigate the relationship
between HAS
and MAS
?
From these plots, infer whether any of the covariates will be useful in explaining the distribution of detection distances.
We would like to treat OBs
and HAS
as
factor variables as in the original analysis; OBs
is, by
default, treated as a factor variable because it consists of characters
rather than numbers. HAS
, on the other hand, consists of
numbers and so by default would be treated as a continuous variable
(i.e. non-factor). That is fine if we want the effect of
HAS
to be monotonic (i.e. detectability either increases or
decreases as a function of HAS
). If we want
HAS
to have a non-linear effect on detectability, then we
need to indicate to R
to treat it as a factor as shown
below.
amakihi$HAS <- factor(amakihi$HAS)
One other, more subtle adjustment, is a transformation of the
continuous covariate MAS
. We are considering three possible
covariates in our detection function: OBs
, HAS
and MAS
. The first two variables, OBs
and
HAS
, are both factor variables, and so, essentially, we can
think of them as taking on values between 1 and 3 in the case of
OBS
, and 1 to 6 in the case of HAS
. However,
MAS
can take on values from -18 (detections before sunrise)
to >300 and the disparity in scales of measure between
MAS
and the other candidate covariates can lead to
difficulties in the performance of the optimizer fitting the detection
functions in R. The solution to the difficulty is to scale
MAS
such that it is on a scale (approx. 1 to 5) comparable
with the other covariates.
Scaling the continuous covariate MAS
makes optimisation
to estiamte parameters of the detection function simpler.
V1
Min. :-2.08724
1st Qu.:-0.82597
Median :-0.05081
Mean : 0.00000
3rd Qu.: 0.76376
Max. : 2.18270
NA's :2
With three potential covariates, there are 8 possible models for the detection function:
Even without considering covariates there are also several possible key function/adjustment term combinations available: if all key function/covariate combinations are considered the number of potential models is large. Note that covariates are not allowed if a uniform key function is chosen and if covariate terms are included, adjustment terms are not allowed. Even with these restrictions, it is not best practice to take a scatter gun approach to detection function model fitting. Buckland et al. (2015) considered 13 combinations of key function/covariates. Here, we look at a subset of these.
Fit a hazard rate model with no covariates or adjustment terms and make a note of the AIC. Note, that 15% of the largest distances are truncated - you may have decided on a different truncation distance.
conversion.factor <- convert_units("meter", NULL, "hectare")
amak.hr <- ds(amakihi, transect="point", key="hr", truncation="15%",
adjustment=NULL, convert_units = conversion.factor)
Now fit a hazard rate model with OBs
as a covariate in
the detection function and make a note of the AIC. Has the AIC reduced
by including a covariate?
amak.hr.obs <- ds(amakihi, transect="point", key="hr", formula=~OBs,
truncation="15%", convert_units = conversion.factor)
Fit a hazard rate model with OBs
and MAS
in
the detection function:
amak.hr.obs.mas <- ds(amakihi, transect="point", key="hr", formula=~OBs+MAS,
truncation="15%", convert_units = conversion.factor)
Try fitting other possible formula and decide which model is best in
terms of AIC. To quickly compare AIC values from different models, use
the AIC
command as follows (note only models with the same
truncation distance can be compared):
AIC(amak.hr, amak.hr.obs, amak.hr.obs.mas)
df AIC
amak.hr 2 11400.47
amak.hr.obs 4 11368.20
amak.hr.obs.mas 5 11365.96
Another useful function is summarize_ds_models
- this
has the advantage of ordering the models by AIC (smallest to
largest).
knitr::kable(summarize_ds_models(amak.hr, amak.hr.obs, amak.hr.obs.mas), digits=3,
caption="Model selection table for Hawaiian amakihi.")
Model | Key function | Formula | C-vM p-value | \(\hat{P_a}\) | se(\(\hat{P_a}\)) | \(\Delta\)AIC | |
---|---|---|---|---|---|---|---|
3 | Hazard-rate | ~OBs + MAS | 0.170 | 0.302 | 0.022 | 0.000 | |
2 | Hazard-rate | ~OBs | 0.116 | 0.297 | 0.022 | 2.249 | |
1 | Hazard-rate | ~1 | 0.144 | 0.308 | 0.022 | 34.516 |
Examine the shape of the preferred detection function (including covariates observer and minutes after sunrise) (Figure 4).
plot(amak.hr.obs.mas, pdf=TRUE, main="Hazard rate with observer and minutes after sunrise.")
There were three observers involved in the survey. One observer made ~80% of the detections, with a second observer responsible for a further 15% and the third observer 5%.