Variance estimation

Variance estimation using bootstrap resampling.

Eric Rexstad http://distancesampling.org (CREEM, Univ of St Andrews)https://creem.st-andrews.ac.uk
2020-01-31

Table of Contents


Continuing with the Montrave winter wren line transect data from the line transect vignette, we focus upon producing robust estimates of precision in our point estimates of abundance and density. The analysis in R (R Core Team, 2019) makes use of the Distance package (Miller, Rexstad, Thomas, Marshall, & Laake, 2019).

Objectives

Survey data

The R workspace wren_lt contains detections of winter wrens from the line transect surveys of Buckland (2006).


library(Distance)
data(wren_lt)

The function names() allows you to see the names of the columns of the data frame wren_lt. Definitions of those fields were provided in the line transect vignette.

The effort, or transect length has been adjusted to recognise each transect is walked twice.


conversion.factor <- convert_units("meter", "kilometer", "hectare")

Fitting a suitable detection function

Rather than refitting models used in the line transect vignette, we move directly to the model selected by Buckland (2006).


wren.unif.cos <- ds(wren_lt, key="unif", adjustment="cos",
                  convert.units=conversion.factor)

Based upon experience in the field, the uniform cosine model was used for inference.

Estimation of precision

Looking at the density estimates from the uniform cosine model


print(wren.unif.cos$dht$individuals$D)

  Label Estimate        se        cv       lcl      ucl       df
1 Total 1.133287 0.1709893 0.1508791 0.8423904 1.524636 139.2075

The coefficient of variation (CV) is 0.151, and confidence interval bounds are (0.84 - 1.52) birds per hectare. The coefficient of variation is based upon a delta-method approximation of the uncertainty in both the parameters of the detection function and the variability in encounter rates between transects.

\[[CV(\hat{D})]^2 = [CV(\frac{n}{L})]^2 + [CV(P_a)]^2\] where

These confidence interval bounds assume the sampling distribution of \(\hat{D}\) is log-normal (Buckland, Rexstad, Marques, & Oedekoven, 2015, Section 6.2.1).

Bootstrap estimates of precision

Rather than relying upon the delta-method approximation that assumes independence between uncertainty in the detection function and variability in encounter rate, a bootstrap procedure can be employed. Resampling with replacement of the transects produces replicate samples with which a sampling distribution of \(\hat{D}\) is approximated. From that sampling distribution, the percentile method is used to produce confidence interval bounds respecting the shape of the sampling distribution (Buckland et al., 2015, Section 6.3.1.2).

The function bootdht_Nhat_summarize is included in the Distance package. It is used to extract information from the object created by bootdht.


bootdht_Nhat_summarize <- function(ests, fit) {
  return(data.frame(N=ests$individuals$N$Estimate))
}

After the summary function is defined, the bootstrap procedure can be performed. Arguments here are the name of the fitted object, the object containing the data, conversion factor and number of bootstrap replicates.


est.boot <- bootdht(model=wren.unif.cos, flatfile=wren_lt,
                    summary_fun=bootdht_Nhat_summarize,
                    convert.units=conversion.factor, nboot=99)

The object est.boot contains a data frame with two columns consisting of \(\hat{N}\) as specified in bootdht_Nhat_summarize. This data frame can be processed to produce a histogram representing the sampling distribution of the estimated parameters as well as the percentile confidence interval bounds.


alpha <- 0.05
(bootci <- quantile(est.boot$N, probs = c(alpha/2, 1-alpha/2)))

    2.5%    97.5% 
29.18786 44.99379 

Incorporating model uncertainty in precision estimates

The argument model in bootdht can be a single model as shown above, or it can consist of a list of models. In the later instance, all models in the list are fitted to each bootstrap replicate and model selection based on AIC is performed for each replicate. The consequence is that model uncertainty is incorporated into the resulting estimate of precision.


wren.hn <- ds(wren_lt, key="hn", adjustment=NULL,
                  convert.units=conversion.factor)
wren.hr.poly <- ds(wren_lt, key="hr", adjustment="poly",
                  convert.units=conversion.factor)
est.boot.uncert <- bootdht(model=list(wren.hn, wren.hr.poly, wren.unif.cos), 
                           flatfile=wren_lt,
                           summary_fun=bootdht_Nhat_summarize,
                           convert.units=conversion.factor, nboot=99)

(modselci <- quantile(est.boot.uncert$N, probs = c(alpha/2, 1-alpha/2)))

    2.5%    97.5% 
28.44801 42.68570 

Comments

Recognise that producing bootstrap estimates of precision is computer-intensive. In this example we have created only 99 bootstrap replicates in the interest of computation time. For inference you wish to draw, you will likely increase the number of bootstrap replicates to 999.

For this data set, the bootstrap estimate of precision is greater than the delta-method approximation precision (based on confidence interval width). In addition, incoroprating model uncertainty into the estimate of precision for density changes the precision estimate very little. The confidence interval width without incorporating model uncertainty is 15.806 while the confidence interval including model uncertainty is 14.238. This represents a change of -10% due to uncertainty regarding the best model for these data.

Buckland, S., Rexstad, E., Marques, T., & Oedekoven, C. (2015). Distance sampling: Methods and applications. Springer.

Buckland, S. T. (2006). Point transect surveys for songbirds: Robust methodologies. The Auk, 123(2), 345–345. https://doi.org/10.1642/0004-8038(2006)123[345:psfsrm]2.0.co;2

Miller, D. L., Rexstad, E., Thomas, L., Marshall, L., & Laake, J. L. (2019). Distance sampling in r. Journal of Statistical Software, 89(1), 1–28. https://doi.org/10.18637/jss.v089.i01

R Core Team. (2019). R: A language and environment for statistical computing. Vienna Austria: R Foundation for Statistical Computing.