Covariate modeling with rare species

Use of covariate to model detectability in multi-species surveys.

Eric Rexstad (CREEM, Univ of St Andrews)


Sometimes the focal species of a distance sampling survey is quite rare. So rare that it is difficult to accumulate sufficient detections to fit a detection function for the species in question. Likewise, it is also common for other species to be detected during the survey for the focal species. Could the detections of the other species be useful in estimating a detection function for the focal species?

One approach might be to consider the species to serve as “strata” and proceed to analyse the data as if they were from a stratified survey. See the example for stratified survey analysis. However, if a pooled detection function (one that combines data from multiple species) is fitted, it would be dubious to apply this pooled detection function to data at a lower level of aggregation (species level). Applying the pooled detection function would lead to a biased estimate of abundance for the rare species.

Instead of treating species as strata, an alternative form of analysis is to treat species as a covariate in the modelling of the detection function (Marques & Buckland, 2003). The principle is that the general key function is shared across species, but the scale parameter \((\sigma)\) differs between species. In this way, the detections of all species is shared, such that the estimation of the detection function for the rare species is bolstered by information from other species; yet the rare species receives its own unique detection function such that bias is not induced in the abundance estimation for that species.

To demonstrate such an analysis, the Montrave songbird study conducted by Buckland (2006) is used. The species covariate approach to analysis of the snapshot point count version of his survey is described in the book by Buckland et al. (2015, sec. The Distance R package (Miller, Rexstad, Thomas, Marshall, & Laake, 2019) is used to analyse the line transect survey Buckland conducted. Results are compared with estimates presented by Buckland (2006).

The data are available online at a website that serves as a companion to Buckland et al. (2015). The data set can be read into R directly from the URL.

birds <- read.csv("")
# Add object column (required for dht2)
birds$object <- NA
birds$object[!$distance)] <- 1:sum(!$distance))

Data preparation

Only one slight modification to the data needs to be conducted before they can be analysed. Buckland (2006) made two transits of the transects, the line transect effort needs to be modified to reflect the multiple visits.

birds$Effort <- birds$Effort * birds$repeats   # two visits
convunit <- convert_units("meter", "kilometer", "hectare")

Detections by species

In Buckland’s (2006) line transect survey, three of the four songbird species (c-chaffinch, g-great tit, r-robin, w-winter wren) were detected in sufficient quantities that sample size is not an issue. However, the great tit was only detected 32 times, making the support for this species open to question.

Table 1: Number of detections by species for Montrave line transect survey.
Var1 Freq
c 73
g 32
r 82
w 156

As mentioned in the Background, we could fit a pooled detection function across species and with species as a stratification criterion produce species-specific density estimates using the pooled detection function in conjunction with species-specific encounter rates. However that would be using the wrong detection function for every species. We take the alternative analysis route and incorporate species into the detection function.

Covariate in detection function

Inclusion of species as a covariate in the detection function is simple using the formula= argument in ds(). Note the species names are coded as letters, R will automatically treat a variable containing letters as a factor covariate. If numbers were used in coding species, as.factor would need to be employed.

all.birds <- ds(data = birds,
                key="hn", convert_units = convunit,
                formula=~species, truncation = 95)

The CvM goodness of fit test indicates this model adequately fits the data, W=0.401, P=0.072.

Visualising the detection functions for each species

The shape of the species-specific detection functions can be seen by using the plotting function provided below.

plot(all.birds, showpoints=FALSE, main="Montrave line transects\nspecies as covariate")
add.df.covar.line(all.birds, data=data.frame(species="c"), 
                  lwd=3, lty=1, col="blue")
add.df.covar.line(all.birds, data=data.frame(species="g"), 
                  lwd=3, lty=1, col="darkgreen")
add.df.covar.line(all.birds, data=data.frame(species="r"), 
                  lwd=3, lty=1, col="brown")
add.df.covar.line(all.birds, data=data.frame(species="w"), 
                  lwd=3, lty=1, col="salmon")
legend("topright", legend=c("chaffinch", "great tit", "robin", "winter wren"),
       lwd=3, lty=1, col=c("blue", "darkgreen", "brown", "salmon"))
Species-specific detection functions.

Figure 1: Species-specific detection functions.

Species-specific density estimates

Density estimates for each species can be produced by using the dht2 function that contains the argument strat_formula used to specific the levels of stratum-specific estimates requested. The stratification argument ensures the correct measures of precision are associated with the species-specific density estimates. The value object indicates this analysis is a form of post-stratification, rather than geographic stratification criterion that could have been know prior to the gathering of the data.

bird.ests <- dht2(ddf=all.birds, flatfile=birds,
                  strat_formula = ~species, convert_units = convunit,
                  stratification = "object") 
Table 2: Species-specific density estimates using detection function with species as covariate.
species n Density Density_CV LCI UCI
c 73 0.641 0.170 0.456 0.900
g 32 0.251 0.242 0.156 0.406
r 80 0.769 0.149 0.572 1.032
w 155 1.149 0.113 0.918 1.437

Compare with published estimates

The density estimates for chaffinch and great tits match those reported by Buckland (S. T. Buckland, 2006) almost exactly. The congruence between estimates produced by this analysis and those reported by Buckland are less good for the robins and winter wrens.

Reproduction of Table 2 of Buckland (2006).

Figure 2: Reproduction of Table 2 of Buckland (2006).


As described by Buckland (S. T. Buckland, 2006), there was some reason to believe evasive movement took place on the part of robins and winter wrens. Conceivably, this could be accommodated by using a hazard rate key function for those two species. This would lead to a more complex analysis in which the data set was divided into a chaffinch/great tit data set, with a half normal key and species covariate detection function model. The other portion of the data set would contain robins/winter wrens modelled using a hazard rate key function and species covariate.

Indeed, the goodness of fit for this more complex analysis (not shown) leads to better fit of the “two model” approach:

Table 3: Goodness of fit comparison for single model compared with HN/HR split.
analysis CvM.W P.value
Single analysis 0.401 0.072
HN key 0.068 0.762
HR key 0.222 0.228
Buckland, S. T. (2006). Point transect surveys for songbirds: Robust methodologies. The Auk, 123(2), 345–345.[345:psfsrm];2
Buckland, S., Rexstad, E., Marques, T., & Oedekoven, C. (2015). Distance sampling: Methods and applications. Retrieved from
Marques, F. F. C., & Buckland, S. T. (2003). Incorporating covariates into standard line transect analyses. Biometrics, 59, 924–935.
Miller, D. L., Rexstad, E., Thomas, L., Marshall, L., & Laake, J. L. (2019). Distance sampling in r. Journal of Statistical Software, 89(1), 1–28.