Distance sampling simulation included in Buckland et al. (2015)

Description and R code to produce simulation described in Sections 3.5.2 and 11.1.1 in Buckland et al. (2015); demonstrating the use of the package dsims to evaluate subjective vs random placement of transects. Simulation designed and implemented by L. Marshall.

Laura Marshall http://distancesampling.org (CREEM, Univ of St Andrews)https://creem.st-andrews.ac.uk

This case study shows you how to use the R package dsims (Marshall, 2022a) to compare the performance of different survey designs. We will replicate the analyses of Sections 3.5.2 and 11.1.1 of (Buckland, Rexstad, Marques, & Oedekoven, 2015).

Getting started

Ensure you have administrator privileges on your computer and install the necessary R packages.

needed.packages <- c("dsims")
myrepo <- "http://cran.rstudio.com"
install.packages(needed.packages, repos = myrepo)

Directory structure for files in this project

In addition to the R packages, there are additional files required by this analysis. All necessary material are included in a Zip archive file entitled dsims_study.zip. When that archive is uncompressed, the directory structure will be created as described.

Examine the other files and folders in the “dsims_study” folder. There are three files starting with the name “Region” and ending with .dbf, .shp and .shx, these files make up the shapefile for the survey region. The density.robj file is the density surface for the survey region. The Survey Transects folder contains files wth the name “subj_des” describing the shapefile to be used for the subjective design. The Results folder contains the results from 999 replications as this can take a good few hours to run. Select the directory that contains these files.

Creating a new simulation

Creating a region object

First we reate the region object using the region shapefile. As there are no strata in this example, you just need to provide a name for your survey region and specify the units (metres, m) as there is no projection information supplied with this shapefile.

region <- make.region(region.name = "Survey Region", units = "m", shape = "Region.shp")

View the resulting object:


Creating a density object

Now create a density object within this region. For this study, a density surface has already been created, but you can experiment with the options in the next code chunk to define one yourself.

You can create other density surfaces by creating a density object based on a uniform density grid over the area, and adding some hot spots (or low spots).

density <- make.density(region = region, x.space = 1000, y.space = 1000,
    constant = 4e-08)
density <- add.hotspot(density, centre = c(-2500, 2224000), sigma = 10000,
    amplitude = 1e-08)
density <- add.hotspot(density, centre = c(0, 2184000), sigma = 18000,
    amplitude = -5e-09)
# Plot this example density surface
plot(density, region)

Load the predefined density object and view the data that comprise the surface by plotting it. This density surface was created using a combination of adding low and high spots and also calculating density as a function of distance to path (the paths will be shown later in the subjective design). The values obtained from this process were then smoothed by fitting a generalised additive model and predicting to the density grid points.

plot(density, region)

Population size

Fix the population size at 1500 individuals. Using make.population.description set the first argument to the desired population abundance and set the second argument such that exactly this number of individuals is generated in each realisation (fixed.N = TRUE).

pop.description <- make.population.description(region = region, density = density,
    N = 1500, fixed.N = TRUE)

True detection function

We select a half-normal detection function with a \(\sigma\) (scale.param) of 500m and a truncation distance of 1000m.

detect <- make.detectability(key.function = "hn", scale.param = 500, truncation = 1000)

Creating the survey design object

We first consider the subjective design (Section 11.1.1 of (Buckland et al., 2015)). The subjective design uses existing paths together with additional transects chosen to achieve a more even coverage of the survey region.

There is no specific definition in dsims (in fact the design package dssd (Marshall, 2022b)) for subjective designs but we can instead define it as a random line transect design. When we run the survey or simulation we will then point to the shapefile containing the transects that we wish to use.

subjective.design <- make.design(transect.type = "line", design = c("random"),
    region = region, edge.protocol = "minus", truncation = 1000)

Creating the analyses object

Describe the analyses to carry out on the simulated data. Here we propose both half-normal and hazard-rate models (neither with covariates) and choose between them based on the AIC values.

ds.analyses <- make.ds.analysis(dfmodel = list(~1, ~1), key = c("hn", "hr"),
    truncation = 1000, criteria = "AIC")

Creating the simulation object

We can finally put it all together and have a look at some example populations, transects and survey data. Set the number of repetitions (reps) to be fairly low initially to avoid long run times.

simulation.subjective <- make.simulation(reps = 10, design = subjective.design,
    population.description = pop.description, detectability = detect, ds.analysis = ds.analyses)

Before running the simulation, you can check to see that it is doing what you want by using the run.survey function to create an instance of a single simulated survey based on the simulation setup. At this point given that we want to use a set of pre-created transects we will need to specify the shapefile to use.

# Get the
transect.filename <- paste(getwd(), "/Survey_Transects/Subj_Des.shp", sep = "")
# Simulate a single survey
survey <- run.survey(simulation.subjective, filename = transect.filename)

plot(survey, region)

If the previous plots lead you to believe you have properly parameterised your simulation, it is time to run it. Be patient, as it will take a few minutes to complete, even though there are only 10 replicates. Again we need to supply the path to our subjective design transects for the simulation (otherwise it will generate randomly placed lines).

simulation.subjective.run <- run.simulation(simulation.subjective, transect.path = transect.filename)

Information about effort, estimated abundance, estimated density and detection probability \((P_a)\) is available in the resulting object. Results from each replicate for abundance estimation can be examined, such as

kable(t(simulation.subjective.run@results$individuals$N[, , 5]), digits = 3,
    caption = "Estimate of abundance (with measures of precision) for the fifth replicate of the simulation performed above.")
Table 1: Estimate of abundance (with measures of precision) for the fifth replicate of the simulation performed above.
Estimate se cv lcl ucl df
923.943 145.122 0.157 666.305 1281.2 18.867

or average abundance estimates across replicates

kable(t(simulation.subjective.run@results$individuals$N[, , 11]), digits = 3,
    caption = "Average estimate of abundance (with measures of precision) across all replicates of the simulation performed above.")
Table 2: Average estimate of abundance (with measures of precision) across all replicates of the simulation performed above.
Estimate se cv lcl ucl df
1123.666 232.125 0.204 731.387 1739.25 19.059

There is also a summary() function for simulation results objects that provide more exhaustive results.

On to randomised designs

You will need to create 2 new simulations each with a new design object, one for the parallel design and one for the zigzag design. The other objects (region, density, population description etc.) should not be changed. Here is how to do it for the parallel design. As these randomised designs can be generated automatically in R using the dssd package (via dsims), we no longer need to supply a the shapefile(s) when we run the survey or simulation.

parallel.design <- make.design(transect.type = "line", design = "systematic",
    region = region, design.angle = 45, spacing = 12000, edge.protocol = "minus",
    truncation = 1000)

simulation.parallel <- make.simulation(reps = 999, design = parallel.design,
    population.description = pop.description, detectability = detect, ds.analysis = ds.analyses)

The code does not complete the process of analysing and summarising the simulation of systematic parallel line transects. The user can complete this investigation. The parameterisation of the zigzag design should be set up as shown below.

zigzag.design <- make.design(transect.type = "line", design = "eszigzag",
    region = region, design.angle = 135, spacing = 8250, bounding.shape = "convex.hull",
    edge.protocol = "minus", truncation = 1000)

simulation.zigzag <- make.simulation(reps = 999, design = zigzag.design,
    population.description = pop.description, detectability = detect, ds.analysis = ds.analyses)

Contrast bias and precision from subjective and randomised surveys

More than 10 replicates are needed to adequately address the question of bias. Results of a more extensive simulation study are stored as workspace objects. The following code loads these results.


We can extract components from these R objects to approximate the findings presented in Table 2.1, page 29 of (Buckland et al., 2015).

extract.metrics <- function(simobj) {
    nreps <- simobj@reps
    data <- simobj@results$individuals$summary
    effort.avg <- data[, , nreps + 1][3]/1000
    detects.avg <- data[, , nreps + 1][4]
    abund <- simobj@results$individuals$N
    abund.avg <- abund[, , nreps + 1][1]
    true.N <- simobj@population.description@N
    est.bias <- (abund.avg - true.N)/true.N
    est.bias.pct <- est.bias * 100
    mean.stderr <- abund[, , nreps + 1][2]
    stddev <- abund[, , nreps + 2][1]
    result.vector <- unname(c(effort.avg, detects.avg, abund.avg, est.bias.pct,
        mean.stderr, stddev))
subjective <- extract.metrics(simulation.subjective)
parallel <- extract.metrics(simulation.parallel)
zigzag <- extract.metrics(simulation.zigzag)

result.table <- data.frame(subjective, parallel, zigzag, row.names = c("Mean effort(km)",
    "Mean sample size", "Mean abundance estimate", "Estimated percent bias",
    "Mean SE of abundance estimate", "SD of abundance estimates"))
kable(result.table, digits = 1, caption = "Simulation results summary for each design.  Estimates based upon 999 simulations with true abundance of 1500 objects.")
Table 3: Simulation results summary for each design. Estimates based upon 999 simulations with true abundance of 1500 objects.
subjective parallel zigzag
Mean effort(km) 339.8 473.2 694.5
Mean sample size 87.2 145.9 215.1
Mean abundance estimate 1225.1 1477.5 1483.8
Estimated percent bias -18.3 -1.5 -1.1
Mean SE of abundance estimate 270.7 206.3 172.7
SD of abundance estimates 208.9 198.5 155.4
Buckland, S. T., Rexstad, E. A., Marques, T. A., & Oedekoven, C. S. (2015). Distance sampling: Methods and applications. Retrieved from https://www.springer.com/gp/book/9783319192185
Marshall, L. (2022a). Dsims: Distance sampling simulations. Retrieved from https://CRAN.R-project.org/package=dsims
Marshall, L. (2022b). Dssd: Distance sampling survey design. Retrieved from https://CRAN.R-project.org/package=dssd