Scottish crossbill surveys using call lures and a supplemental set of trials to establish detection probability function. This example also demonstrates the use of bootstrapping methods to compute precision of the abundance estimate.

This practical is based on the lure point transect case study in (Buckland, Rexstad, Marques, & Oedekoven, 2015), a simplified version of the analysis in (Buckland, Summers, Borchers, & Thomas, 2006; Summers & Buckland, 2010). Generalised linear models (GLMs) are used to model the response of Scottish crossbills to a lure in order to estimate their probability of response and hence estimate the density and abundance. To provide a measure of precision for the abundance estimate, 95% confidence intervals are obtained by bootstrapping.

The Scottish crossbill *(Loxia scotica)* is Britain’s only endemic bird species. A point transect study was conducted to obtain the number of birds within each point after responding to an audible lure. The probability of responding to the lure was estimated by recording the response of previously detected birds to the lure at different distances (Summers & Buckland, 2010).

- Fit a GLM
- Obtain predicted values from GLM
- Calculate abundance
- Using
`for`

loop to bootstrap abundance.

The data provided in the response trials are:

- No. - trial number
- day - days from 1st January
- time - hour of the day
- habitat - habitat type (1=plantation, 2= native pinewood)
- dist - distance of the bird when the lure was played (m)
- behavcode - behaviour code (1=perching and feeding, 2= giving excitement calls, 3=singing)
- numbirds - flock size
- response - response of bird to lure (0=no response, 1=response).

The trials data are in file `lure-trials.csv`

. Import the data and check that it has been read correctly. Files needed for this exercise can be found on this Github repository.

```
xbill <- read.csv(file="lure-trials.csv", header=TRUE)
```

```
head(xbill, n=2)
```

```
No. day time habitat dist behavcode numbirds response
1 1 47 9 1 150 1 1 0
2 2 47 11 1 150 1 2 0
```

Examine how many birds did, or did not, respond to the lure:

```
table(xbill$response)
```

```
0 1
62 113
```

There are six potential covariates that might affect the probability that a bird responds to the lure (`day`

, `time`

, `dist`

, `numbirds`

, `habitat`

, and `behavcode`

): the latter two are factor type variables and so we need to treat them as factors in models:

```
xbill$habitat <- factor(xbill$habitat)
xbill$behavcode <- factor(xbill$behavcode)
```

To look at the response in each factor level create a two-way table, for example:

```
table(xbill$response, xbill$habitat)
```

```
1 2
0 6 56
1 25 88
```

```
# Divide plot window into 4
par(mfrow=c(2,2))
boxplot(xbill$dist~xbill$response, xlab="Response", ylab="Distance (m)")
boxplot(xbill$numbirds~xbill$response, xlab="Response", ylab="Flock size")
boxplot(xbill$day~xbill$response, xlab="Response", ylab="Days from 1 Jan")
boxplot(xbill$time~xbill$response, xlab="Response", ylab="Hour of day")
par(mfrow=c(1,1))
```

Qualitatively, these boxplots suggest little difference in crossbill behaviour in response to the lure attributable to flock size or time of day. There may be more influence of distance and date upon response to lure. Models will be fitted to make stronger inference.

Building a model to explain the probability of response in terms of the potential covariates. The dependent variable, `response`

, can only take two values (0 and 1) and so rather than fit a linear regression model to these data we fit a GLM. The `glm`

function allows us to specify a distribution for the dependent variable in the model with the `family`

argument. In effect, this performs a logistic regression, with *success* (animal responding) modelled as a function of explanatory covariates.

We can include all the covariates in a model as follows.

```
model1 <- glm(response~dist+numbirds+day+time+habitat+behavcode, family=binomial,
data=xbill)
```

As usual, the `summary`

function can be used to display details of the model object.

```
summary(model1)
```

```
Call:
glm(formula = response ~ dist + numbirds + day + time + habitat +
behavcode, family = binomial, data = xbill)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9526 -0.8708 0.5563 0.8168 2.0009
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.641601 1.301148 2.030 0.0423 *
dist -0.010086 0.002323 -4.342 1.41e-05 ***
numbirds 0.076882 0.081172 0.947 0.3436
day -0.008597 0.007405 -1.161 0.2456
time -0.020251 0.113258 -0.179 0.8581
habitat2 -0.263713 0.574532 -0.459 0.6462
behavcode2 0.070903 0.495256 0.143 0.8862
behavcode3 0.962426 0.614541 1.566 0.1173
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 219.23 on 166 degrees of freedom
Residual deviance: 169.10 on 159 degrees of freedom
(8 observations deleted due to missingness)
AIC: 185.1
Number of Fisher Scoring iterations: 5
```

We see that only `dist`

has a coefficient that is significantly different from zero. Experiment with dropping non-significant terms. Using a backwards stepping procedure, consistent with the visual inspection of the boxplots above, other covariates remain non-significant, resulting in a simple model:

```
model2 <- glm(response~dist, family=binomial, data=xbill)
```

```
summary(model2)
```

```
Call:
glm(formula = response ~ dist, family = binomial, data = xbill)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9521 -0.8066 0.5957 0.7618 1.9355
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.16370 0.33944 6.374 1.84e-10 ***
dist -0.01049 0.00210 -4.993 5.94e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 227.52 on 174 degrees of freedom
Residual deviance: 179.44 on 173 degrees of freedom
AIC: 183.44
Number of Fisher Scoring iterations: 5
```

(Degrees of freedom change a little between models because some covariates in the first model have missing values and so these observations are excluded.)

Having fitted a model, we now want to see how the predicted probability of response changes with distance (similar to a detection function model). We assume that the maximum distance that a crossbill will respond to a lure is 850m. Here we create a ‘prediction’ data frame that has one column called `dist`

and this ranges from 0 to 850 (in unit intervals). (Prediction data needs to contain objects with the same names as the explanatory variables in the fitted model.)

```
w <- 850
preddata <- data.frame(dist=0:w)
phat <- predict.glm(model2, newdata=preddata, type="response")
```

Now we have the estimated probabilities, we can overlay this onto a plot of the observed responses (black circles).

```
ggplot(data=xbill, aes(x=dist, y=response)) + geom_point(shape=1) +
geom_smooth(method="glm", method.args=list(family="binomial")) +
labs(title="Logistic regress fit to trial survey data") +
labs(x="Distance from lure (m)", y="Pr(responding)")
```

This section is technical and require understanding of abundance estimation with point transects.

Abundance is obtained from

\[\hat N = \frac {n \cdot A}{\hat{P}_a \cdot a}\] where

- \(n\) is the number of detections
- \(A\) is the area of the study region (i.e. 3505.8 km\(^2\))
- \(P_a\) is the probability of response (or detection) in the covered area, and
- \(a\) is the area of the covered region (i.e. \(a=k\pi w^2\) where \(k\) is the number of points)

First we calculate \(P_a\). To do this we need to specify the function \(\pi(r),r \le w\), which represents the probability density function (pdf) of distances of animals from the point. Assuming crossbills are equally likely at all distances from the point, the pdf is triangular:

```
pi.r <- preddata$dist/sum(preddata$dist)
preddata$pi <- pi.r
ggplot(data=preddata, aes(x=dist, y=pi.r)) + geom_point(size=0.6) +
labs(title="Assumed abundance of birds") +
labs(x="Distance from lure (m)", y=expression(pi[r]))
```

Then we multiply \(\pi(r)\) with the probability of response and (numerically) integrate from 0 to \(w\).

```
Pa <- sum(phat * pi.r)
print(Pa)
```

```
[1] 0.09709381
```

Assuming that the probability of response is a function of distance only (as in `model2`

), then \(P_a\) is just less than 10% (i.e. <10% of birds within 850m of a point are detected).

Read in data from the point transect survey. These data consist of:

- point - point transect identifier
- nscottish - the number of Scottish crossbills detected at the point

Note that detection distances are unknown in the main survey: instead, we have used the trials data to estimate the detection function, and hence the proportion of birds within 850m that are detected.

```
detections <- read.csv("mainsurveydetections.csv", header=TRUE)
n <- sum(detections$nscottish)
```

We now calculate the number of points (\(k\)) in the main survey, and hence, the total covered area within 850m of a point, converting from m\(^2\) to km\(^2\). Note that `pi`

is a reserved word to represent \(\pi\) (i.e. 3.141593).

```
k <- length(detections$point)
# Covered area (km2)
a <- k * pi * (w/1000)^2
# Size of the study region (km2)
A <- 3505.8
```

We can now estimate the size of the population as:

```
Nscot <- (n*A)/(Pa*a)
```

```
[1] 10008
```

We can calculate \(1-\alpha\) confidence interval for true abundance by bootstrapping both trials and points. The steps involved are

- randomly generate (with replacement) a new set of response data,
- estimate \(P_a\) for the new data,
- generate (with replacement) a new set of point transect data,
- estimate abundance,
- repeat steps 1-4 many times to build a distribution of abundances and
- use the \(\alpha/2\) and \(1-\alpha/2\) percentiles of the distribution as confidence interval bounds.

The following code does this:

```
# Initialise parameters
# Number of bootstraps
nboot <- 999
# Number of trials
m <- length(xbill$dist)
# Create empty vectors to store new sample
bdistances <- vector(length=m)
bresponse <- vector(length=m)
# Create empty vector to store bootstrap abundances
bNscot <- vector(length=nboot)
# Create prediction data (w is truncation distance defined earlier)
pred <- data.frame(bdistances=0:w)
# A loop for the bootstraps
for (i in 1:nboot) {
# Bootstrap trials
# Generate index of sample
btindex <- sample(1:m, size=m, replace=TRUE)
for (j in 1:m) {
bdistances[j] <- xbill$dist[btindex[j]]
bresponse[j] <- xbill$response[btindex[j]]
}
# Fit GLM
bmodel <- glm(bresponse ~ bdistances, family=binomial)
# Predict probability of response
bphat <- predict.glm(bmodel, newdata=pred, type="response")
# Calculate Pa
bPa <- sum(bphat * pi.r)
# Bootstrap points
rindex <- sample(1:k, k, replace=TRUE)
n <- sum(detections$nscottish[rindex])
# Calculate abundance
bNscot[i] <- (n*A)/(bPa*a)
} # End of bootstrap loop
```

Having obtained a distribution of abundances, the \(\alpha/2\) and \(1-\alpha/2\) percentiles can be obtained:

```
alpha <- 0.05
bounds <- c(alpha/2, 1-(alpha/2))
plot.this <- as.data.frame(bNscot)
ggplot(data=plot.this, aes(bNscot)) +
geom_histogram(fill="white", colour="black") +
labs(title="Distribution of abundance estimates") +
labs(x=expression(hat(N)), y="Count") +
geom_vline(xintercept=quantile(bNscot, probs=bounds),
size=1.5, linetype="dotted")
```

```
round(quantile(bNscot, probs=bounds))
```

```
2.5% 97.5%
5751 17920
```

Buckland, S. T., Rexstad, E. A., Marques, T. A., & Oedekoven, C. S. (2015). *Distance sampling: Methods and applications*. Retrieved from https://www.springer.com/gb/book/9783319192185

Buckland, S. T., Summers, R. W., Borchers, D. L., & Thomas, L. (2006). Point transect sampling with traps or lures. *Journal of Applied Ecology*, *43*(2), 377–384. https://doi.org/10.1111/j.1365-2664.2006.01135.x

Summers, R. W., & Buckland, S. T. (2010). A first survey of the global population size and distribution of the scottish crossbill *loxia scotica*. *Bird Conservation International*, *21*(02), 186–198. https://doi.org/10.1017/s0959270909990323