Distance sampling simulation from Buckland et al. (2015)

Description and R code to produce simulation described in Sections 3.5.2 and 11.1.1 in Buckland et al. (2015); demonstrating the use of the package DSsim to evaluate subjective vs random placement of transects.

Laura Marshall http://distancesampling.org (CREEM, Univ of St Andrews)https://creem.st-andrews.ac.uk
2020-04-10

Table of Contents


This case study shows you how to use the R package DSsim (Marshall, 2019) to compare the performance of different survey designs. We will replicate the analyses of Sections 3.5.2 and 11.1.1 of (Buckland, Rexstad, Marques, & Oedekoven, 2015).

Getting started

Ensure you have administrator privileges on your computer and install the necessary R packages.


needed.packages <- c("DSsim", "mrds", "shapefiles", "splancs")
myrepo <- "http://cran.rstudio.com"
install.packages(needed.packages, repos = myrepo)

Directory structure for files in this project

In addition to the R packages, there are additional files required by this analysis. All necessary material are included in a Zip archive file entitled DSsim_study.zip. When that archive is uncompressed, the directory structure will be created as described.

Examine the other files and folders in the “DSsim_study” folder. There are three files starting with the name “Region” and ending with .dbf, .shp and .shx, these files make up the shapefile for the survey region. The density.surface.robj file is the density surface for the survey region. The Survey Transects folder contains a folder for each of the designs you are asked to look at, these contain the transect shapefiles. The Results folder contains the results from 999 replications as this can take a good few hours to run. Select the directory that contains these files.

Alternative analysis using Distance for Windows

Some parts of this exercise can also be performed using Distance for Windows (Thomas et al., 2010). That software cannot recreate the subjective survey design (see below), but can assess the performance of other survey designs described here. The Zip archive DSsimExercise.zip can be downloaded to your computer and opened with Distance for Windows (version 7.3 and above).

Creating a new simulation

Creating a region object

Read the Region shapefile into R.


library(shapefiles)
region.shapefile <- read.shapefile("DSsim_study/Region")

Next create the region object using this shapefile. As there are no strata in this example, you just need to provide a name for your survey region and specify the units (metres, m).


region <- make.region(region.name = "Survey Region", units = "m", shapefile = region.shapefile)

View the resulting object:


plot(region, plot.units = "km")

Creating a density object

Now create a density object within this region. For this study, a density surface has already been created, but you can experiment with the options in the next code chunk to define one yourself.

You can create other density surfaces by creating a density object based on a uniform density grid over the area, and adding some hot spots (or low spots).


density <- make.density(region = region, x.space = 1000, y.space = 1000, 
    constant = 4e-08)
density <- add.hotspot(density, centre = c(-2500, 2224000), sigma = 10000, 
    amplitude = 1e-08)
density <- add.hotspot(density, centre = c(0, 2184000), sigma = 18000, 
    amplitude = -5e-09)

Load the density surface and view the data that comprise the surface.


load("DSsim_study/density.surface.robj")
kable(density.surface[[1]][1:5, ])
x y density
17071.09 2152699 1e-07
18071.09 2152699 1e-07
15071.09 2153699 1e-07
16071.09 2153699 1e-07
17071.09 2153699 1e-07

The density surface is a data set of x and y locations and corresponding densities at each point.

To create the density object, you need to provide the density surface, the region object for which it was created, and the grid spacing that was used. I used a grid spacing of 1000m in both the x and y directions to create this density surface.


pop.density <- make.density(region = region, density.surface = density.surface, 
    x.space = 1000, y.space = 1000)

View the density surface.


plot(pop.density, plot.units = "km")
plot(region, add = TRUE)

Population size

Fix the population size at 1500 individuals. Using make.population.description set the first argument to the desired population abundance and set the second argument such that exactly this number of individuals is generated in each realisation (fixed.N = TRUE).


pop.description <- make.population.description(region.obj = region, density.obj = pop.density, 
    N = 1500, fixed.N = TRUE)

True detection function

We select a half-normal detection function with a \(\sigma\) (scale.param) of 500m and a truncation distance of 1000m.


detect <- make.detectability(key.function = "hn", scale.param = 500, truncation = 1000)

Creating the survey design object

We first consider the subjective design (Section 11.1.1 of (Buckland et al., 2015)). The subjective design uses existing paths together with additional transects chosen to achieve a more even coverage of the survey region.

NOTE: you will need to edit the path argument to describe where the files are on your machine. Look back to the path you found when you were setting up and add “/Survey Transects/Subjective Design”.


new.directory <- paste(getwd(), "DSsim_study/Survey_Transects/Subjective_Design", 
    sep = "/")
subjective.design <- make.design(transect.type = "Line", design.details = c("user specified"), 
    region = region, plus.sampling = FALSE, path = new.directory)
subjective.design@filenames

[1] "New_Des1"

Previous code not only constructs the subjective design, but prints out the names of the shapefiles that will be used in the simulation. This is a check that the working directory has been set properly. The command subjective.design@filenames is a check to make sure the correct folder has been identified. This will tell you the names of the shapefiles the simulation will use.

Creating the analyses object

Describe the analyses to carry out on the simulated data. Here we propose both half-normal and hazard-rate models (neither with covariates) and choose between them based on the AIC values.


ddf.analyses <- make.ddf.analysis.list(dsmodel = list(~cds(key = "hn", 
    formula = ~1), ~cds(key = "hr", formula = ~1)), method = "ds", criteria = "AIC")

Creating the simulation object

We can finally put it all together and have a look at some example populations, transects and survey data. Set the number of repetitions (reps) to be fairly low initially to avoid long run times. For the subjective design, specify that the same set of transects is to be used in each replicate, single.transect.set = TRUE.


my.simulation.subjective <- make.simulation(reps = 10, single.transect.set = TRUE, 
    region.obj = region, design.obj = subjective.design, population.description.obj = pop.description, 
    detectability.obj = detect, ddf.analyses.list = ddf.analyses)

Before running the simulation, you can check to see that it is doing what you want. These commands will allow you to investigate the simulation properties.


# set the display window up for 4 plots
par(mfrow = c(2, 2))
# generate and plot and example population
pop <- generate.population(my.simulation.subjective)
plot(region)
par(cex = 0.3)
plot(pop)
# generate (or rather load from file) the transects
transects <- generate.transects(my.simulation.subjective)
plot(region)
plot(transects, col = 4, lwd = 2)
# simulate the survey process of detection
eg.survey <- create.survey.results(my.simulation.subjective)
plot(eg.survey)
par(cex = 1)
# have a look at the distance data from the simulated survey
dist.data <- get.distance.data(eg.survey)
hist(dist.data$distance, xlab = "Distance (m)", main = "Distance Data")


par(mfrow = c(1, 1))

If the previous plots lead you to believe you have properly parameterised your simulation, it is time to run it. Be patient, as it will take a few minutes to complete, even though there are only 10 replicates.


my.simulation.subjective.run <- run(my.simulation.subjective)

Information about effort, estimated abundance, estimated density and detection probability \((P_a)\) is available in the resulting object. Results from each replicate for abundance estimation can be examined, such as


kable(t(my.simulation.subjective.run@results$individuals$N[, , 5]), digits = 3, 
    caption = "Estimate of abundance (with measures of precision) for the fifth replicate of the simulation performed above.")
Table 1: Estimate of abundance (with measures of precision) for the fifth replicate of the simulation performed above.
Estimate se cv lcl ucl df
1855.342 586.617 0.316 957.703 3594.32 14.174

or average abundance estimates across replicates


kable(t(my.simulation.subjective.run@results$individuals$N[, , 11]), digits = 3, 
    caption = "Average estimate of abundance (with measures of precision) across all replicates of the simulation performed above.")
Table 2: Average estimate of abundance (with measures of precision) across all replicates of the simulation performed above.
Estimate se cv lcl ucl df
1456.045 631.259 0.49 629.711 3641.815 14.177

There is also a summary() function for simulation results objects that provide more exhaustive results.

On to randomised designs

You will need to create 2 new simulations each with a new design object, one for the parallel design and one for the zigzag design. The other objects (region, density, population description etc.) should not be changed. Here is how to do it for the parallel design (path directed to folder containing shape files associated with a parallel transect survey design).

Note: We now wish different transects to be used on each repetition (single.transect.set = FALSE).


parallel.directory <- paste(getwd(), "DSsim_study/Survey_Transects/Parallel_Design", 
    sep = "/")
parallel.design <- make.design(transect.type = "Line", design.details = c("Parallel", 
    "Systematic"), region.obj = region, design.axis = 45, spacing = 12000, 
    plus.sampling = FALSE, path = parallel.directory)
my.simulation.parallel <- make.simulation(reps = 10, single.transect.set = FALSE, 
    region.obj = region, design.obj = parallel.design, population.description.obj = pop.description, 
    detectability.obj = detect, ddf.analyses.list = ddf.analyses)

The code does not complete the process of analysing and summarising the simulation of systematic parallel line transects. The user can complete this investigation.

Contrast bias and precision from subjective and randomised surveys

More than 10 replicates are needed to adequately address the question of bias. Results of a more extensive simulation study are stored as workspace objects. The following code loads these results.


load("DSsim_study/Results/simulation.subjective.robj")
load("DSsim_study/Results/simulation.parallel.robj")
load("DSsim_study/Results/simulation.zigzag.robj")

We can extract components from these R objects to approximate the findings presented in Table 2.1, page 29 of (Buckland et al., 2015).


extract.metrics <- function(simobj) {
    nreps <- simobj@reps
    data <- simobj@results$individuals$summary
    effort.avg <- data[, , nreps + 1][3]/1000
    detects.avg <- data[, , nreps + 1][4]
    abund <- simobj@results$individuals$N
    abund.avg <- abund[, , nreps + 1][1]
    true.N <- simobj@population.description@N
    est.bias <- (abund.avg - true.N)/true.N
    est.bias.pct <- est.bias * 100
    mean.stderr <- abund[, , nreps + 1][2]
    stddev <- abund[, , nreps + 2][1]
    result.vector <- unname(c(effort.avg, detects.avg, abund.avg, est.bias.pct, 
        mean.stderr, stddev))
    return(result.vector)
}
subjective <- extract.metrics(simulation.subjective)
parallel <- extract.metrics(simulation.parallel)
zigzag <- extract.metrics(simulation.zigzag)

result.table <- data.frame(subjective, parallel, zigzag, row.names = c("Mean effort(km)", 
    "Mean sample size", "Mean abundance estimate", "Estimated percent bias", 
    "Mean SE of abundance estimate", "SD of abundance estimates"))
kable(result.table, digits = 1, caption = "Simulation results summary for each design.  Estimates based upon 999 simulations with true abundance of 1500 objects.")
Table 3: Simulation results summary for each design. Estimates based upon 999 simulations with true abundance of 1500 objects.
subjective parallel zigzag
Mean effort(km) 336.8 473.5 694.6
Mean sample size 85.0 146.5 215.2
Mean abundance estimate 1214.5 1484.5 1480.2
Estimated percent bias -19.0 -1.0 -1.3
Mean SE of abundance estimate 283.4 216.2 173.6
SD of abundance estimates 225.0 193.9 154.9

Buckland, S. T., Rexstad, E. A., Marques, T. A., & Oedekoven, C. S. (2015). Distance sampling: Methods and applications. Retrieved from https://www.springer.com/gp/book/9783319192185

Marshall, L. (2019). DSsim: Distance sampling simulations. Retrieved from https://CRAN.R-project.org/package=DSsim

Thomas, L., Buckland, S. T., Rexstad, E. A., Laake, J. L., Strindberg, S., Hedley, S. L., … Burnham, K. P. (2010). Distance software: Design and analysis of distance sampling surveys for estimating population size. Journal of Applied Ecology, 47(1), 5–14. Retrieved from https://doi.org/10.1111/j.1365-2664.2009.01737.x