Lure point transect surveys from Buckland et al. (2015)

Scottish crossbill surveys using call lures and a supplemental set of trials to establish detection probability function. This example also demonstrates the use of bootstrapping methods to compute precision of the abundance estimate.

Eric Rexstad (CREEM, Univ of St Andrews)

Table of Contents

Building detection function with supplemental surveys

This practical is based on the lure point transect case study in (Buckland, Rexstad, Marques, & Oedekoven, 2015), a simplified version of the analysis in (Buckland, Summers, Borchers, & Thomas, 2006; Summers & Buckland, 2010). Generalised linear models (GLMs) are used to model the response of Scottish crossbills to a lure in order to estimate their probability of response and hence estimate the density and abundance. To provide a measure of precision for the abundance estimate, 95% confidence intervals are obtained by bootstrapping.

The Scottish crossbill (Loxia scotica) is Britain’s only endemic bird species. A point transect study was conducted to obtain the number of birds within each point after responding to an audible lure. The probability of responding to the lure was estimated by recording the response of previously detected birds to the lure at different distances (Summers & Buckland, 2010).

Steps of this exercise

  1. Fit a GLM
  2. Obtain predicted values from GLM
  3. Calculate abundance
  4. Using for loop to bootstrap abundance.

The lure trials

The data provided in the response trials are:

The trials data are in file lure-trials.csv. Import the data and check that it has been read correctly. Files needed for this exercise can be found on this Github repository.

xbill <- read.csv(file="lure-trials.csv", header=TRUE)

head(xbill, n=2)

  No. day time habitat dist behavcode numbirds response
1   1  47    9       1  150         1        1        0
2   2  47   11       1  150         1        2        0

Summarising the data

Examine how many birds did, or did not, respond to the lure:


  0   1 
 62 113 

There are six potential covariates that might affect the probability that a bird responds to the lure (day, time, dist, numbirds, habitat, and behavcode): the latter two are factor type variables and so we need to treat them as factors in models:

xbill$habitat <- factor(xbill$habitat)
xbill$behavcode <- factor(xbill$behavcode)

To look at the response in each factor level create a two-way table, for example:

table(xbill$response, xbill$habitat)

     1  2
  0  6 56
  1 25 88

# Divide plot window into 4
boxplot(xbill$dist~xbill$response, xlab="Response", ylab="Distance (m)")
boxplot(xbill$numbirds~xbill$response, xlab="Response", ylab="Flock size")
boxplot(xbill$day~xbill$response, xlab="Response", ylab="Days from 1 Jan")
boxplot(xbill$time~xbill$response, xlab="Response", ylab="Hour of day")

Qualitatively, these boxplots suggest little difference in crossbill behaviour in response to the lure attributable to flock size or time of day. There may be more influence of distance and date upon response to lure. Models will be fitted to make stronger inference.

Fitting a GLM

Building a model to explain the probability of response in terms of the potential covariates. The dependent variable, response, can only take two values (0 and 1) and so rather than fit a linear regression model to these data we fit a GLM. The glm function allows us to specify a distribution for the dependent variable in the model with the family argument. In effect, this performs a logistic regression, with success (animal responding) modelled as a function of explanatory covariates.

We can include all the covariates in a model as follows.

model1 <- glm(response~dist+numbirds+day+time+habitat+behavcode, family=binomial,

As usual, the summary function can be used to display details of the model object.


glm(formula = response ~ dist + numbirds + day + time + habitat + 
    behavcode, family = binomial, data = xbill)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9526  -0.8708   0.5563   0.8168   2.0009  

             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.641601   1.301148   2.030   0.0423 *  
dist        -0.010086   0.002323  -4.342 1.41e-05 ***
numbirds     0.076882   0.081172   0.947   0.3436    
day         -0.008597   0.007405  -1.161   0.2456    
time        -0.020251   0.113258  -0.179   0.8581    
habitat2    -0.263713   0.574532  -0.459   0.6462    
behavcode2   0.070903   0.495256   0.143   0.8862    
behavcode3   0.962426   0.614541   1.566   0.1173    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 219.23  on 166  degrees of freedom
Residual deviance: 169.10  on 159  degrees of freedom
  (8 observations deleted due to missingness)
AIC: 185.1

Number of Fisher Scoring iterations: 5

We see that only dist has a coefficient that is significantly different from zero. Experiment with dropping non-significant terms. Using a backwards stepping procedure, consistent with the visual inspection of the boxplots above, other covariates remain non-significant, resulting in a simple model:

model2 <- glm(response~dist, family=binomial, data=xbill)


glm(formula = response ~ dist, family = binomial, data = xbill)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9521  -0.8066   0.5957   0.7618   1.9355  

            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.16370    0.33944   6.374 1.84e-10 ***
dist        -0.01049    0.00210  -4.993 5.94e-07 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 227.52  on 174  degrees of freedom
Residual deviance: 179.44  on 173  degrees of freedom
AIC: 183.44

Number of Fisher Scoring iterations: 5

(Degrees of freedom change a little between models because some covariates in the first model have missing values and so these observations are excluded.)


Having fitted a model, we now want to see how the predicted probability of response changes with distance (similar to a detection function model). We assume that the maximum distance that a crossbill will respond to a lure is 850m. Here we create a ‘prediction’ data frame that has one column called dist and this ranges from 0 to 850 (in unit intervals). (Prediction data needs to contain objects with the same names as the explanatory variables in the fitted model.)

w <- 850
preddata <- data.frame(dist=0:w)
phat <- predict.glm(model2, newdata=preddata, type="response")

Now we have the estimated probabilities, we can overlay this onto a plot of the observed responses (black circles).

ggplot(data=xbill, aes(x=dist, y=response)) + geom_point(shape=1) +
  geom_smooth(method="glm", method.args=list(family="binomial")) +
  labs(title="Logistic regress fit to trial survey data") +
  labs(x="Distance from lure (m)", y="Pr(responding)")   

Estimating abundance

This section is technical and require understanding of abundance estimation with point transects.

Abundance is obtained from

\[\hat N = \frac {n \cdot A}{\hat{P}_a \cdot a}\] where

First we calculate \(P_a\). To do this we need to specify the function \(\pi(r),r \le w\), which represents the probability density function (pdf) of distances of animals from the point. Assuming crossbills are equally likely at all distances from the point, the pdf is triangular:

pi.r <- preddata$dist/sum(preddata$dist)
preddata$pi <- pi.r
ggplot(data=preddata, aes(x=dist, y=pi.r)) + geom_point(size=0.6) +
  labs(title="Assumed abundance of birds") +
  labs(x="Distance from lure (m)", y=expression(pi[r])) 

Then we multiply \(\pi(r)\) with the probability of response and (numerically) integrate from 0 to \(w\).

Pa <- sum(phat * pi.r) 

[1] 0.09709381

Assuming that the probability of response is a function of distance only (as in model2), then \(P_a\) is just less than 10% (i.e. <10% of birds within 850m of a point are detected).

Analysis of main survey data

Read in data from the point transect survey. These data consist of:

Note that detection distances are unknown in the main survey: instead, we have used the trials data to estimate the detection function, and hence the proportion of birds within 850m that are detected.

detections <- read.csv("mainsurveydetections.csv", header=TRUE)
n <- sum(detections$nscottish)

We now calculate the number of points (\(k\)) in the main survey, and hence, the total covered area within 850m of a point, converting from m\(^2\) to km\(^2\). Note that pi is a reserved word to represent \(\pi\) (i.e. 3.141593).

k <- length(detections$point)
# Covered area (km2)
a <- k * pi * (w/1000)^2
# Size of the study region (km2)
A <- 3505.8  

We can now estimate the size of the population as:

Nscot <- (n*A)/(Pa*a)

[1] 10008

Measure of precision in abundance estimate

We can calculate \(1-\alpha\) confidence interval for true abundance by bootstrapping both trials and points. The steps involved are

  1. randomly generate (with replacement) a new set of response data,
  2. estimate \(P_a\) for the new data,
  3. generate (with replacement) a new set of point transect data,
  4. estimate abundance,
  5. repeat steps 1-4 many times to build a distribution of abundances and
  6. use the \(\alpha/2\) and \(1-\alpha/2\) percentiles of the distribution as confidence interval bounds.

The following code does this:

# Initialise parameters
# Number of bootstraps 
nboot <- 999
# Number of trials
m <- length(xbill$dist)
# Create empty vectors to store new sample
bdistances <- vector(length=m)
bresponse <- vector(length=m)
# Create empty vector to store bootstrap abundances
bNscot <- vector(length=nboot)
# Create prediction data (w is truncation distance defined earlier)
pred <- data.frame(bdistances=0:w)
# A loop for the bootstraps
for (i in 1:nboot) {
  # Bootstrap trials
  # Generate index of sample 
  btindex <- sample(1:m, size=m, replace=TRUE)
  for (j in 1:m) {
    bdistances[j] <- xbill$dist[btindex[j]]
    bresponse[j] <- xbill$response[btindex[j]]
  # Fit GLM 
  bmodel <- glm(bresponse ~ bdistances, family=binomial)
  # Predict probability of response
  bphat <- predict.glm(bmodel, newdata=pred, type="response")
  # Calculate Pa
  bPa <- sum(bphat * pi.r)
  # Bootstrap points
  rindex <- sample(1:k, k, replace=TRUE)
  n <- sum(detections$nscottish[rindex])
  # Calculate abundance
  bNscot[i] <- (n*A)/(bPa*a)
} # End of bootstrap loop

Having obtained a distribution of abundances, the \(\alpha/2\) and \(1-\alpha/2\) percentiles can be obtained:

alpha <- 0.05
bounds <- c(alpha/2, 1-(alpha/2))
plot.this <-
ggplot(data=plot.this, aes(bNscot)) + 
  geom_histogram(fill="white", colour="black") + 
  labs(title="Distribution of abundance estimates") +
  labs(x=expression(hat(N)), y="Count") +
  geom_vline(xintercept=quantile(bNscot, probs=bounds), 
             size=1.5, linetype="dotted")
Distribution of abundance estimates from bootstrap.

Figure 2: Distribution of abundance estimates from bootstrap.

round(quantile(bNscot, probs=bounds))

 2.5% 97.5% 
 5751 17920 

Buckland, S. T., Rexstad, E. A., Marques, T. A., & Oedekoven, C. S. (2015). Distance sampling: Methods and applications. Retrieved from

Buckland, S. T., Summers, R. W., Borchers, D. L., & Thomas, L. (2006). Point transect sampling with traps or lures. Journal of Applied Ecology, 43(2), 377–384.

Summers, R. W., & Buckland, S. T. (2010). A first survey of the global population size and distribution of the scottish crossbill loxia scotica. Bird Conservation International, 21(02), 186–198.