Example analysis with Ivory Coast Maxwell’s duiker.
A distance sampling approach to the analysis of camera trapping data offers the potential advantage that individual animal identification is not required. However, accurate animal-to-camera detection distances are required. This requires calibration prior to the survey with images of objects taken at known distances from the camera. See details in Howe, Buckland, Després-Einspenner, & Kühl (2017) for description of the field work and data analysis. Here we present analysis of data from Howe et al. (2017) using the R package Distance
(Miller, Rexstad, Thomas, Marshall, & Laake, 2019).
Heat- and motion-sensitive camera traps detect only moving animals within the range of the sensor and the field of view of the camera. Animals are therefore unavailable for detection by camera traps when they are stationary, and when they are above (e.g., semi-arboreal species) or below (e.g., semi-fossorial species) the range of the sensor or the camera, regardless of their distance from the camera in two dimensions. This temporally limited availability for detection must be accounted for to avoid negative bias in estimated densities. When data are abundant, researchers may choose to include only data from times when 100% of the population can be assumed to be active within the vertical range of camera traps (Howe et al., 2017). However, for rarely-detected species or surveys with lower effort, it might be necessary to include most or all observations of distance. In these situations, survey duration (\(T_k\)) might be 12- or 24-hours per day, and it becomes necessary to estimate the proportion of time included in \(T_k\) when animals were available for detection. Methods for estimating this proportion directly from CT data have been described (Rowcliffe, Kays, Kranstauber, Carbone, & Jansen, 2014), and it can be included in analyses to estimate density (Bessone et al., 2020), for example as another multiplier, potentially with an associated standard errors.
Times of independent camera triggering events for the period 28 June 21 September 2014 at 23 cameras are recorded in a file described in the data repository Howe, Buckland, Després-Einspenner, Kühl, & Buckland (2018).
trigger.events <- read.table(file="https://datadryad.org/stash/downloads/file_stream/73223",
header=TRUE)
The format of the trigger.events
data frame is adjusted to create a datetime
field for use in the activity
package Rowcliffe (2021)
trigger.events$date <- paste("2014",
sprintf("%02i", trigger.events$month),
sprintf("%02i", trigger.events$day),
sep="/")
trigger.events$time <- paste(sprintf("%02i", trigger.events$hour),
sprintf("%02i", trigger.events$minute),
sep=":")
trigger.events$datetime <- paste(trigger.events$date, trigger.events$time)
activity
packageWe will employ two functions from the activity
package. First, convert the time of day of a camera triggering event into the fraction of the 24hr cycle when the event took place, measured in radians. In other words, an event occurring at midday is recorded as \(\pi\) and an event occurring at midnight is recorded as 2\(\pi\).
With the radian conversion of the camera triggering times, the distribution of the triggering events times is smoothed, using a kernel smoother by the function fitact
. The function estimates the proportion of time (in a 24hr day) animals were active. In addition, the triggering time data can be resampled to provide a measure of uncertainty in the point estimate of activity proportion.
act_result <- fitact(trigger.events$rtime, sample="data", reps=100)
A plot of the histogram of triggering times, along with the fitted smooth is provided by a plot function applied to the object returned by fitact
.
plot(act_result)
The value computed by the smooth through the activity histogram can be extracted from the object created by fitact
. The extraction reaches into the object to look at the slot
called act
. The uncertainty around the point estimate is derived from resampling that takes place within fitact
. The slot will display the point estimates, standard error and confidence interval bounds.
print(act_result@act)
act se lcl.2.5% ucl.97.5%
0.33463831 0.02133658 0.29386939 0.37740705
The output above would be used to adjust density estimates for temporal activity if the cameras were in operation 24hrs per day. However, in this study, cameras were only active for 11.5 hours per day (0630-1800).
We use the temporal availability information to create a multiplier. Our multiplier must be defined as > proportion of the camera operation time animals were available to be detected
This is not equivalent to the value produced by the fitact
function; that value is the proportion of 24hr animals were available to be detected. The availability multiplier must be adjusted based on the daily camera operation period. Uncertainty in this proportion is also included in our computations.
The point estimate and standard error are pulled from the fitact
object, adjusted for daily camera operation time and placed into a data frame named creation
in a named list, specifically in the fashion shown.
camera.operation.per.day <- 11.5
prop.camera.time <- camera.operation.per.day / 24
avail <- list(creation=data.frame(rate = act_result@act[1]/prop.camera.time,
SE = act_result@act[2]/prop.camera.time))
A more robust way of incorporating uncertainty in the temporal availability estimate will be described later.
Detection distances for the full daytime data set is also available on Howe et al. (2018) and is read in the code chunk below:
DuikerCameraTraps <- read.csv(file="https://datadryad.org/stash/downloads/file_stream/73221",
header=TRUE, sep="\t")
DuikerCameraTraps$Area <- DuikerCameraTraps$Area / (1000*1000)
DuikerCameraTraps$object <- NA
DuikerCameraTraps$object[!is.na(DuikerCameraTraps$distance)] <- 1:sum(!is.na(DuikerCameraTraps$distance))
Data file recorded study area size in square meters; second line above converts this to area in square kilometers; the remaining lines create an object
field, which uniquely identify each observation.
A quick summary of the data set including: How many camera stations and how many detections in total.
[1] 11180
table(DuikerCameraTraps$Sample.Label)
A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 C5 C6
388 66 988 420 3 1951 73 208 52 195 767 153 41 2682
D3 D4 D5 E3 E4 E5 E6
342 193 524 518 1 375 1241
Note, three sampling stations (B1, C5, E4) had no detections. The one record for each of those stations has distance recorded as NA
, but the record is important because it contains effort information.
A quick examination of the distribution of detection distances; note the bespoke cutpoints causing distance bins to be narrow out to 8m, then increasing in width to the maximum detection distance of 21m.
As described by Howe et al. (2017):
a paucity of observations between 1 and 2 m but not between 2 and 3 m, so we left-truncated at 2 m. Fitted detection functions and probability density functions were heavy-tailed when distances >15 m were included, so we right truncated at 15 m.
The conversion factor must be included both in the call to ds()
and the call to bootdht()
.
Candidate models considered here differ from the candidate set presented in Howe et al. (2017). This set includes
The maximum number of parameters in models within the candidate model set is no more than 3.
library(Distance)
trunc.list <- list(left=2, right=15)
mybreaks <- c(seq(2,8,1), 10, 12, 15)
conversion <- convert_units("meter", NULL, "square kilometer")
uni1 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
nadj=1, convert_units = conversion,
cutpoints = mybreaks, truncation = trunc.list)
uni2 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
nadj=2, convert_units = conversion,
cutpoints = mybreaks, truncation = trunc.list)
uni3 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
nadj=3, convert_units = conversion,
cutpoints = mybreaks, truncation = trunc.list)
hn0 <- ds(DuikerCameraTraps, transect = "point", key="hn", adjustment = NULL,
convert_units = conversion, cutpoints = mybreaks, truncation = trunc.list)
hn1 <- ds(DuikerCameraTraps, transect = "point", key="hn", adjustment = "cos",
nadj=1, convert_units = conversion,
cutpoints = mybreaks, truncation = trunc.list)
hn2 <- ds(DuikerCameraTraps, transect = "point", key="hn", adjustment = "cos",
nadj=2, convert_units = conversion,
cutpoints = mybreaks, truncation = trunc.list)
hr0 <- ds(DuikerCameraTraps, transect = "point", key="hr", adjustment = NULL,
convert_units = conversion, cutpoints = mybreaks, truncation = trunc.list)
hr1 <- ds(DuikerCameraTraps, transect = "point", key="hr", adjustment = "poly",
nadj=1, convert_units = conversion,
cutpoints = mybreaks, truncation = trunc.list)
We do not present the density estimates produced from the fitted detection function models because a) we have not chosen a preferred model and b) the density estimates have not been adjusted for viewing angle and temporal availability.
Overdispersion causes AIC to select overly-complex models, so analysts should specify the number/order of adjustment terms manually when fitting distance sampling models to data from camera traps, rather than allowing automated selection using AIC. Howe, Buckland, Després-Einspenner, & Kühl (2019) describe two methods for performing model selection of distance sampling models in the face of overdispersion. Here we provide R functions to perform the first of these methods. The first method of Howe et al. (2019) employs a two-step process. First, an overdisersion factor \((\hat{c})\) is computed for each key function family from the most complex model in each family. The \(\hat{c}\) is derived from the \(\chi^2\) goodness of fit test statistic divided by its degrees of freedom. This results in an adjusted AIC score for each model in the key function family:
\[QAIC = -2 \left \{ \frac{log(\mathcal{L}(\hat{\theta}))}{\hat{c}} \right \} + 2K\]
Code to perform this QAIC computation is found in the function QAIC
in the Distance
package, and produces the following results:
Tables of QAIC values for each key function family are shown below (code for kable()
calls suppressed for easier readability of results).
df | QAIC | |
---|---|---|
uni1 | 1 | 2825.316 |
uni2 | 2 | 2826.822 |
uni3 | 3 | 2823.921 |
df | QAIC | |
---|---|---|
hn0 | 1 | 2296.204 |
hn1 | 2 | 2292.252 |
hn2 | 3 | 2294.096 |
df | QAIC | |
---|---|---|
hr0 | 2 | 2465.119 |
hr1 | 3 | 2466.395 |
From this first pass of model selection based on QAIC values, we find the model with the uniform key function preferred by QAIC has three cosine adjustment terms. The preferred model from the half normal key function family has one cosine adjustment term. Finally, the preferable model from the hazard rate key function family has no adjustment terms.
The second step of model selection ranks the models by their \(\hat{c}\) values.
chats <- chi2_select(uni3, hn1, hr0)$criteria
modnames <- unlist(lapply(list(uni3, hn1, hr0), function(x) x$ddf$name.message))
results <- data.frame(modnames, chats)
results.sort <- results[order(results$chats),]
knitr::kable(results.sort, digits=2, row.names = FALSE,
caption="Compare with Table S5 of Howe et al. (2018)") %>%
kable_paper(full_width = FALSE) %>%
row_spec(1, bold=TRUE, background = "#4da6ff")
modnames | chats |
---|---|
uniform key function with cosine(1,2,3) adjustments | 15.63 |
half-normal key function with cosine(2) adjustments | 16.54 |
hazard-rate key function | 17.21 |
For this data set, the model chosen by this algorithm that adjusts for overdispersion is the same model (uniform key with three cosine adjustments) as would have been chosen by conventional model selection; but again, not the model selected by Howe et al. (2017) because of the differing candidate model sets.
As a check of the detection function vis-a-vis Howe et al. (2017), the paper reports the effective detection radius (\(\rho\)) to be 9.4m for the peak activity data set. Howe et al. (2017) employed a different candidate model set, resulting in the unadjusted hazard rate model as the preferred model. Here we present the estimated effective detection radius from the selected uniform key function with three cosine adjustment terms.
The effective detection radius can be derived from \(\hat{P_a}\) as reported by the function ds
as
\[\hat{\rho} = \sqrt{\hat{P_a} \cdot w^2}\]
\(\hat{P_a}\) is estimated to be 0.329, resulting in an estimate of \(\hat{\rho}\) of 7.457.
The camera traps do not view the entire area around them, as would be the case with simple point transect sampling. The portion of the area sampled needs to be incorporated in the estimation of abundance. The data file contains a column multiplier
that represents the proportion of the circle sampled. Howe et al. (2017) notes the camera angle of view (AOV) of 42\(^{\circ}\). The proportion of the circle viewed is this value over 360\(^{\circ}\).
An argument to dht2
is sample_fraction
, an obvious place to include this quantity. We also add the multiplier for temporal availability described in the section on temporal availability The dht2
function will produce analytical measures of precision with this call.
viewangle <- 42 # degrees
samfrac <- viewangle / 360
peak.uni.dens <- dht2(uni3, flatfile=DuikerCameraTraps, strat_formula = ~1,
sample_fraction = samfrac, er_est = "P2", multipliers = avail,
convert_units = conversion)
print(peak.uni.dens, report="density")
Density estimates from distance sampling
Stratification : geographical
Variance : P2, n/L
Multipliers : creation
Sample fraction : 0.1166667
Summary statistics:
.Label Area CoveredArea Effort n k ER se.ER cv.ER
Total 40.37 2596.317 31483179 10284 21 0 0 0.268
Density estimates:
.Label Estimate se cv LCI UCI df
Total 17.2357 4.805 0.279 9.791 30.3411 23.322
Component percentages of variance:
.Label Detection ER Multipliers
Total 2.17 92.6 5.23
To produce a more reliable estimate of the precision of the point estimate, produce bootstrap estimates using bootdht
. The user needs to create a function and another named list to facilitate use of the bootstrap: a summary function to extract information from each replicate and a multiplier list describing how temporal availability is being derived.
As constructed, mysummary
will keep the density estimate produced by each bootstrap replicate and the stratum (if any) to which the estimate pertains.
mysummary <- function(ests, fit){
return(data.frame(Label = ests$individuals$D$Label,
Dhat = ests$individuals$D$Estimate))
}
This rather complex list makes use of make_activity_fn
that exists in the Distance
package used to call the fitact
function from the activity
package. For the user, your responsibility is to provide three arguments to this function:
mult <- list(availability= make_activity_fn(trigger.events$rtime, sample="data",
detector_daily_duration=camera.operation.per.day))
If you’ve tried running this example code in R
you might have noticed that detection function fitting is quite slow. In general, camera traps produce a large amount of distance sampling data, and in addition these data tend to be “overdispersed” meaning (in this case) that there are lots of observations with the same distances. Together, this can cause analyses to run slowly, and this can be especially true for bootstrap analyses for variance estimation.
One way to speed up the bootstrap is to run multiple analyses in parallel, using multiple cores of your computer. You can achieve this using the cores
argument to bootdht
- for fastest results set this to the number of cores on your machine minus 1 (best to leave 1 free for other things). You can find the number of cores by calling parallel::detectCores()
and we do this in the code below.
The second, which is only available to Microsoft Windows
users, is to use the alternative optimization engine MCDS.exe
to fit detection functions. To find out more about this, see our online example on the Alternative optimization engine. If you are using a Windows
machine, you can download MCDS.exe into the required directory by executing the following lines (note, you only need to do this once, although if in future you update the Distance
package you’ll need to re-download it):
download.file("http://distancesampling.org/R/MCDS.exe",
paste0(system.file(package="mrds"),"/MCDS.exe"), mode = "wb")
#Detach and reload the Distance package to make use of it
detach("package:Distance", unload = TRUE)
library(Distance)
We make use of this engine here by re-fitting the selected detection function model, uni3
, using this engine, setting the ds
argument optimizer="MCDS"
. If you are not using a Windows machine or have not doanloaded the MCDS.exe engine, then you should not execute this line, and your bootstrap will take longer to execute.
uni3 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
nadj=3, convert_units = conversion,
cutpoints = mybreaks, truncation = trunc.list,
optimizer = "MCDS")
bootdht
Just as with dht2
there are arguments for the model
, flatfile
, sample_fraction
, convert.units
and multipliers
(although for bootdht
multipliers
uses a function rather than a single value). The only novel arguments to dht2
are resample_transects
indicating camera stations are to be resampled with replacement, and nboot
for the number of bootstrap replicates.
n.cores <- parallel::detectCores()
daytime.boot.uni <- bootdht(model=uni3, flatfile=DuikerCameraTraps,
resample_transects = TRUE, nboot=500,
cores = n.cores - 1,
summary_fun=mysummary, sample_fraction = samfrac,
convert_units = conversion, multipliers=mult)
Bootstrap results
Boostraps : 500
Successes : 500
Failures : 0
median mean se lcl ucl cv
Dhat 18.16 19.16 7.12 8.43 34.07 0.39
hist(daytime.boot.uni$Dhat, breaks = 20,
xlab="Estimated density", main="D-hat estimates bootstraps")
abline(v=quantile(daytime.boot.uni$Dhat, probs = c(0.025,0.975), na.rm=TRUE), lwd=2, lty=3)
The confidence interval derived from the bootstrap is considerably wider than the confidence interval derived from analytical methods.
Feel free to skip this unless you’re a fairly advanced user!
In some cases, it may be necessary to set starting values for the detection function optimization, to help it converge. This can be achieved using the initial_values
arguemnt of the ds
function. As an example, say we want to use the fitted values from the uniform + 2 cosine function uni2
as starting values for the first two parameters of the uniform + 3 cosine function fitting (and 0 for the third parameter). The following code does this:
uni3.with.startvals <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
nadj=3,
cutpoints = mybreaks, truncation = trunc.list,
initial_values = list(adjustment = c(as.numeric(uni2$ddf$par), 0)))
What about when it comes to bootstrapping for variance esitmation. You can pass this model in to boot.dht
with no problems, so long as you don’t set ncores
to more than 1. If you do set ncores
to more than 1 it won’t work, returning 0 successful bootstraps. Why? Because uni2$ddf$par
is not passed along to all those parallel cores. To fix this you have to hard-code the starting values. So, in this example, we see that the values are
print(uni2$ddf$par)
[1] 0.97178178 0.03541633
and so we use
and this will work fine in bootdht
.
A final tip is that setting starting values can sometimes speed up the bootstrap (as optimization is faster if it starts from a good initial spot), so you might want to pass in the starting values from uni3
to your bootstrap routine - something like the following, using the MCDS optimizer: (Note - this code is set not to run in this examples file - just here to show what you might use).
print(uni3$ddf$par)
uni3.with.startvals <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
nadj=3,
cutpoints = mybreaks, truncation = trunc.list,
optimizer = "MCDS",
initial_values = list(adjustment = c(0.93518220, -0.05345965, -0.08073799)))
daytime.boot.uni <- bootdht(model=uni3.with.startvals, flatfile=DuikerCameraTraps,
resample_transects = TRUE, nboot=500,
cores = n.cores - 1,
summary_fun=mysummary, sample_fraction = samfrac,
convert_units = conversion, multipliers=mult)