Incorporating covariates in the detection function

Hawaiian amakihi point transect data.

Eric Rexstad (CREEM, Univ of St Andrews)

In this problem, we illustrate fitting multiple covariate distance sampling (MCDS) models to point transect data using a bird survey from Hawaii: data on an abundant species, the Hawaii amakihi (Hemignathus virens) is used. This practical is makes use of the Distance R package described by Miller et al. (2019) duplicating the analysis in Marques et al. (2007). For basic information regarding analysis of point transect data, consult the point transect example

head(amakihi, n=3)
  Region.Label Area Sample.Label Effort object distance Month OBs
1          792    0            1      1      1       40     0 TJS
2          792    0            1      1      2       60     0 TJS
3          792    0            1      1      3       45     0 TJS
    Sp MAS HAS Study.Area
1 COAM  50   1       Kana
2 COAM  50   1       Kana
3 COAM  50   1       Kana

These data include:

Note that the Area column is always zero, hence, detection functions can be fitted to the data, but bird abundance cannot be estimated. The covariates to be considered for possible inclusion into the detection function are OBs, MAS and HAS.

Exploratory data analysis

It is important to gain an understanding of the data prior to fitting detection functions. With this in mind, preliminary analysis of distance sampling data involves:

We begin by assessing the distribution of distances to decide on a truncation distance (Figure 1).

hist(amakihi$distance, main="Radial distances", xlab="Distance (m)")
Distribution of radial distances of amakihi

Figure 1: Distribution of radial distances of amakihi

To see if there are differences in the distribution of distances recorded by the different observers and in each hour after sunrise, boxplots can be used. Note how the ~ symbol is used to define the discrete groupings (i.e. observer and hour) (Figure 2).

boxplot(amakihi$distance~amakihi$OBs, xlab="Observer", ylab="Distance (m)")
boxplot(amakihi$distance~amakihi$HAS, xlab="Hour", ylab="Distance (m)")
Visual assessment of effect of observer and hour since sunrise upon detection.Visual assessment of effect of observer and hour since sunrise upon detection.

Figure 2: Visual assessment of effect of observer and hour since sunrise upon detection.

The components of the boxplot are:

For minutes after sunrise (a continuous variable), we create a scatterplot of MAS (on the \(x\)-axis) against distances (on the \(y\)-axis). The plotting symbol (or character) is selected with the argument pch (Figure 3)

scatter.smooth(amakihi$MAS, amakihi$distance, family = "gaussian", pch='.', lpars=list(lwd=3),
               xlab="Minutes after sunrise",ylab="Distance (m)")
Visualisation of detectability as function of minutes since sunrise.

Figure 3: Visualisation of detectability as function of minutes since sunrise.

Clearly room for right truncation from this figure of the radial distance distribution. Subsequent detection function fitting will use the truncation argument in ds() to exclude the largest 15% of the detection distances.

You may also want to think about potential collinearity (linear relationship) between the covariates - if collinear variables are included in the detection function, they will be explaining some of the same variation in the distances and this will reduce their importance as a potential covariate. How might you investigate the relationship between HAS and MAS?

From these plots, infer whether any of the covariates will be useful in explaining the distribution of detection distances.

Adjusting the raw covariates

We would like to treat OBs and HAS as factor variables as in the original analysis; OBs is, by default, treated as a factor variable because it consists of characters rather than numbers. HAS, on the other hand, consists of numbers and so by default would be treated as a continuous variable (i.e. non-factor). That is fine if we want the effect of HAS to be monotonic (i.e. detectability either increases or decreases as a function of HAS). If we want HAS to have a non-linear effect on detectability, then we need to indicate to R to treat it as a factor as shown below.

amakihi$HAS <- factor(amakihi$HAS)

One other, more subtle adjustment, is a transformation of the continuous covariate MAS. We are considering three possible covariates in our detection function: OBs, HAS and MAS. The first two variables, OBs and HAS, are both factor variables, and so, essentially, we can think of them as taking on values between 1 and 3 in the case of OBS, and 1 to 6 in the case of HAS. However, MAS can take on values from -18 (detections before sunrise) to >300 and the disparity in scales of measure between MAS and the other candidate covariates can lead to difficulties in the performance of the optimizer fitting the detection functions in R. The solution to the difficulty is to scale MAS such that it is on a scale (approx. 1 to 5) comparable with the other covariates.

Scaling the continuous covariate MAS makes optimisation to estiamte parameters of the detection function simpler.

amakihi$MAS <- scale(amakihi$MAS)
 Min.   :-2.08724  
 1st Qu.:-0.82597  
 Median :-0.05081  
 Mean   : 0.00000  
 3rd Qu.: 0.76376  
 Max.   : 2.18270  
 NA's   :2         

Candidate models

With three potential covariates, there are 8 possible models for the detection function:

Even without considering covariates there are also several possible key function/adjustment term combinations available: if all key function/covariate combinations are considered the number of potential models is large. Note that covariates are not allowed if a uniform key function is chosen and if covariate terms are included, adjustment terms are not allowed. Even with these restrictions, it is not best practice to take a scatter gun approach to detection function model fitting. Buckland et al. (2015) considered 13 combinations of key function/covariates. Here, we look at a subset of these.

Fit a hazard rate model with no covariates or adjustment terms and make a note of the AIC. Note, that 15% of the largest distances are truncated - you may have decided on a different truncation distance.

conversion.factor <- convert_units("meter", NULL, "hectare") <- ds(amakihi, transect="point", key="hr", truncation="15%",
              adjustment=NULL, convert_units = conversion.factor)

Now fit a hazard rate model with OBs as a covariate in the detection function and make a note of the AIC. Has the AIC reduced by including a covariate? <- ds(amakihi, transect="point", key="hr", formula=~OBs,
                  truncation="15%", convert_units = conversion.factor)

Fit a hazard rate model with OBs and MAS in the detection function: <- ds(amakihi, transect="point", key="hr", formula=~OBs+MAS,
                      truncation="15%", convert_units = conversion.factor)

Try fitting other possible formula and decide which model is best in terms of AIC. To quickly compare AIC values from different models, use the AIC command as follows (note only models with the same truncation distance can be compared):

                df      AIC          2 11400.47      4 11368.20  5 11365.96

Another useful function is summarize_ds_models - this has the advantage of ordering the models by AIC (smallest to largest).

knitr::kable(summarize_ds_models(,,, digits=3,
             caption="Model selection table for Hawaiian amakihi.")
Table 1: Model selection table for Hawaiian amakihi.
Model Key function Formula C-vM p-value \(\hat{P_a}\) se(\(\hat{P_a}\)) \(\Delta\)AIC
3 Hazard-rate ~OBs + MAS 0.170 0.302 0.022 0.000
2 Hazard-rate ~OBs 0.116 0.297 0.022 2.249
1 Hazard-rate ~1 0.144 0.308 0.022 34.516

Examine the shape of the preferred detection function (including covariates observer and minutes after sunrise) (Figure 4).

plot(, pdf=TRUE, main="Hazard rate with observer and minutes after sunrise.")
PDF of best fitting model, including effects of observer and minutes after sunrise.

Figure 4: PDF of best fitting model, including effects of observer and minutes after sunrise.

Comments about the chosen model

There were three observers involved in the survey. One observer made ~80% of the detections, with a second observer responsible for a further 15% and the third observer 5%.

Buckland, S., Rexstad, E., Marques, T., & Oedekoven, C. (2015). Distance sampling: Methods and applications. Springer.
Marques, T. A., Thomas, L., Fancy, S. G., & Buckland, S. T. (2007). Improving estimates of bird density using multiple covariate distance sampling. The Auk, 124, 1229–1243.[1229:IEOBDU]2.0.CO;2
Miller, D. L., Rexstad, E., Thomas, L., Marshall, L., & Laake, J. L. (2019). Distance sampling in r. Journal of Statistical Software, 89(1), 1–28.