Point transects for hares analysed with model-based inference to study area.
In France, the French National Agency for Hunting and Wildlife has recently created the Hare network (Mauvy, Souchay, Arnauduc, & Guitton, 2017), a network of voluntary sites where hares and foxes are sampled every year in February using point transect sampling (Peroux, Mauvy, Lartiges, Bray, & Marboutin, 1997).
As an illustration, we used the sampling data from Rouillacais for hare only, in western part of France. At this site, ~ 50 points were sampled during 3 successive nights in February of 2015 and 2016, distance from observers to hares and foxes were recorded. Presence of other species was also noted. Fieldwork was undertaken by skilled observers only. Data for the two years are pooled to produce a single estimate that represents the average density for the two years.
The material of this analysis is provided for illustration purpose only. For other use or questions regarding the Hare network, feel free to contact us at reseau.lievre@oncfs.gouv.fr
Place all following files in same directory as .Rmd
First, we load point transect survey data into R
(R Core Team, 2019).
# load raw data
data <- read.table("Hare_data.csv", header = TRUE, sep = ";", stringsAsFactors=FALSE)
str(data)
'data.frame': 575 obs. of 6 variables:
$ region : chr "Rouillacais_2015" "Rouillacais_2015" "Rouillacais_2015" "Rouillacais_2015" ...
$ point_ID: chr "Rouillacais_2015_1" "Rouillacais_2015_1" "Rouillacais_2015_1" "Rouillacais_2015_1" ...
$ Xcoord : num 419608 419608 419608 419608 419608 ...
$ Ycoord : num 2094820 2094820 2094820 2094820 2094820 ...
$ distance: int NA 258 307 307 236 NA NA NA 131 158 ...
$ effort : int 3 3 3 3 3 3 3 3 3 3 ...
Then, we build our dataset for the detection function.
DSdata <- data.frame(Sample.Label = sub("Rouillacais_2016", "", data$point_ID),
Point = data$point_ID,
Xcoord = as.integer(data$Xcoord),
Ycoord = as.integer(data$Ycoord),
Area = 1)
Distance are given in metres, so density would be estimated as the number of individuals per square metre. To avoid this, we convert metres into kilometres.
DSdata$distance <- as.numeric(data$distance)/1000
Note that the Effort
column contains 3
, as each point was visited 3 times. It is worth noting that in our study, we are not interested in abundance but in density only.
DSdata$Effort <- 3
We now load and format the spatial data from shapefiles. Note there is a shape file for the study area (Contour_Rouillacais) and the point station locations (Rouillacais_points). These shape files reside in the same directory as the point survey data and Rmarkdown file.
library("rgdal")
library("maptools")
library("ggplot2")
library("plyr")
# provide the correct projection for the data
newproj <- "+proj=lcc +nadgrids=ntf_r93.gsb,null +a=6378249.2000 +rf=293.4660210000000 +pm=2.337229167 +lat_0=46.800000000 +lon_0=0.000000000 +k_0=0.99987742 +lat_1=46.800000000 +x_0=600000.000 +y_0=200000.000 +units=m +no_defs"
# import shapefile for the survey area
shape <- readShapeSpatial("Contour_Rouillacais.shp", proj4string = CRS(newproj),
repair=TRUE, force_ring=T, verbose=TRUE)
Shapefile type: Polygon, (5), # of Shapes: 1
Shapefile type: Polygon, (5), # of Shapes: 1
# import shapefile for the points
EPP <- readShapeSpatial("Rouillacais_points.shp", proj4string = CRS(newproj),
repair=TRUE, force_ring=T, verbose=TRUE)
Shapefile type: Point, (1), # of Shapes: 50
Shapefile type: Point, (1), # of Shapes: 50
# make the object simpler
survey.area <- data.frame(shape@polygons[[1]]@Polygons[[1]]@coords)
names(survey.area) <- c("x","y")
We can find the area of the study area (in km\(^2\)):
shape@data$AREA/(1000^2)
[1] 147.3401
We can then produce the map of the area with the sampled points (Figure 1).
# produce a map of the survey area with all the point sampled
p <- qplot(data=survey.area, x=x, y=y, geom="polygon",
fill=I("lightblue"), ylab="y", xlab="x", alpha=I(0.7)) +
geom_point(aes(x=Xcoord, y=Ycoord, group="Point"),
data=DSdata, colour="darkblue") +
coord_equal() +
theme_minimal()
print(p)
Setting up the segment data, which in our case are the points that were visited…
# construct segment (point) data (x, y, Effort, Sample.Label)
segdata <- as.data.frame(matrix(NA, ncol = 5, nrow=100))
segdata <- DSdata[, c("Sample.Label", "Effort", "Point", "Xcoord", "Ycoord")]
segdata <- segdata[!duplicated(segdata), ]
colnames(segdata) <- c("Sample.Label", "Effort", "Segment.Label", "X", "Y")
Setting up the observation data, which links the observations with the segments (points):
obsdata <- DSdata
obsdata$size <- 1
obsdata$object <- 1:nrow(obsdata)
str(obsdata)
'data.frame': 575 obs. of 9 variables:
$ Sample.Label: chr "Rouillacais_2015_1" "Rouillacais_2015_1" "Rouillacais_2015_1" "Rouillacais_2015_1" ...
$ Point : chr "Rouillacais_2015_1" "Rouillacais_2015_1" "Rouillacais_2015_1" "Rouillacais_2015_1" ...
$ Xcoord : int 419607 419607 419607 419607 419607 419341 419341 419341 417947 417947 ...
$ Ycoord : int 2094819 2094819 2094819 2094819 2094819 2096566 2096566 2096566 2095566 2095566 ...
$ Area : num 1 1 1 1 1 1 1 1 1 1 ...
$ distance : num NA 0.258 0.307 0.307 0.236 NA NA NA 0.131 0.158 ...
$ Effort : num 3 3 3 3 3 3 3 3 3 3 ...
$ size : num 1 1 1 1 1 1 1 1 1 1 ...
$ object : int 1 2 3 4 5 6 7 8 9 10 ...
We then create the prediction grid (Figure 2).
# create a prediction grid
# method from http://rfunctions.blogspot.co.uk/2014/12/how-to-create-grid-and-intersect-it.html
library("raster")
library("rgeos")
library("dismo")
# Create an empty raster
grid <- raster(extent(shape))
# Choose its resolution. 500 m in both X and Y (truncation distance)
res(grid) <- 500
# Make the grid have the same coordinate reference system (CRS) as the shapefile.
proj4string(grid) <- proj4string(shape)
# Transform this raster into a polygon and you will have a grid
gridpolygon <- rasterToPolygons(grid)
# Intersect our grid with shape
pred.grid <- intersect(shape, gridpolygon)
# Plot the intersected shape to check if everything is fine.
plot(pred.grid)
# create the data.frame for prediction
preddata <- as.data.frame(matrix(NA, ncol=3, nrow=dim(pred.grid@data)[1]))
colnames(preddata) <- c("X", "Y", "area")
for (i in 1:dim(pred.grid@data)[1]){
preddata[i, c("X", "Y")] <- pred.grid@polygons[[i]]@labpt
preddata[i, c("area")] <- pred.grid@polygons[[i]]@area/(1000^2)
}
The size of each cell in the prediction grid is 0.25 km\(^2\) (created in metres and converted to kilometres).
Detection functions can be fitted using the Distance
R package:
library("Distance")
# define distance bins
cut <- c(0, 0.180, 0.220, 0.280, 0.300)
df_ht <- ds(DSdata, truncation=0.3, transect="point",
formula=~1, key="hn", adjustment=NULL, cutpoints=cut)
We can look at a plot of the detection function and results from goodness of fit testing (Figure 3).
plot(df_ht, pdf=TRUE)
gof_ds(df_ht)
Goodness of fit results for ddf object
Chi-square tests
[0,0.18] (0.18,0.22] (0.22,0.28] (0.28,0.3]
Observed 2.660000e+02 64.000000000 60.00000000 14.0000000
Expected 2.657738e+02 63.236416379 62.18979621 12.8000265
Chisquare 1.925852e-04 0.009220319 0.07710602 0.1124948
Total
Observed 404.0000000
Expected 404.0000000
Chisquare 0.1990137
P = 0.90528 with 2 degrees of freedom
We can see from the above, the model fit seems adequate.
We can now fit some DSMs (Miller, Burt, Rexstad, & Thomas, 2013):
library("dsm")
Now fitting the DSMs, we model the count as a function of space, using a Tweedie (tw()
) response distribution:
mod_tw <- dsm(count~s(X, Y), ddf.obj=df_ht, segment.data=segdata,
observation.data=obsdata, family=tw(), transect="point")
Note that we need to specify transect="point"
to ensure that the effort is calculated properly.
We can then look at the summary
and model checking output (Figure 4).
summary(mod_tw)
Family: Tweedie(p=1.222)
Link function: log
Formula:
count ~ s(X, Y) + offset(off.set)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.35638 0.09534 24.71 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(X,Y) 20.61 25.28 2.996 6.12e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.464 Deviance explained = 49.4%
-REML = 254.02 Scale est. = 2.0851 n = 100
gam.check(mod_tw)
Method: REML Optimizer: outer newton
full convergence after 6 iterations.
Gradient range [-3.695307e-08,7.842544e-08]
(score 254.0232 & scale 2.085148).
Hessian positive definite, eigenvalue range [3.066099,109.0183].
Model rank = 30 / 30
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(X,Y) 29.0 20.6 1.13 0.99
This model is fairly rudimentary, so the plot of response vs fitted values doesn’t seem that great (note the difference in axis scales) but for illustration purposes here we accept what we have.
We can now predict over the prediction grid.
mod_tw_pred <- predict(mod_tw, preddata, preddata$area)
Here we define a convenience function to generate an appropriate data structure for ggplot2
to plot: given the argument fill
(the covariate vector to use as the fill) and a name, return a geom_polygon
object (fill must be in the same order as the polygon data).
grid_plot_obj <- function(shape,fill, name){
# what data were supplied?
names(fill) <- NULL
row.names(fill) <- NULL
data <- data.frame(fill)
names(data) <- name
# ! need to give the right name of the shapefile
sp <- shape
spdf <- SpatialPolygonsDataFrame(sp, data)
spdf@data$id <- rownames(spdf@data)
spdf.points <- fortify(spdf, region="id")
spdf.df <- join(spdf.points, spdf@data, by="id")
# store the x/y even when projected and labelled as "long" and "lat"
spdf.df$x <- spdf.df$long
spdf.df$y <- spdf.df$lat
geom_polygon(aes_string(x="x",y="y",fill=name, group="group"), data=spdf.df)
}
# make the plot
pcount_tw <- ggplot() +
grid_plot_obj(pred.grid, mod_tw_pred, "Density") +
scale_fill_gradient(low="white", high="chocolate4") +
coord_equal() + theme_minimal() +
geom_path(aes(x=x, y=y), data=survey.area) +
geom_point(aes(x = Xcoord, y = Ycoord, group="Point"), data = DSdata, colour = "black") +
labs(fill="Density")
We can also estimate uncertainty in our abundance map in the form of a map of coefficients of variation. Note that since there are no covariates in the detection function, we use the dsm.var.gam
function to estimate the variance (if there were covariates varying at the segment level, such as sea state or observer, we could use dsm.var.prop
) (Figure 5).
# data setup for plotting
preddata.var <- split(preddata, 1:nrow(preddata))
# estimate variance
mod_tw_var <- dsm.var.gam(mod_tw, pred.data=preddata.var, off.set=preddata$area)
summary(mod_tw_var)
Summary of uncertainty in a density surface model calculated
analytically for GAM, with delta method
Approximate asymptotic confidence interval:
2.5% Mean 97.5%
1588.720 2010.275 2543.686
(Using log-Normal approximation)
Point estimate : 2010.275
CV of detection function : 0.06566247
CV from GAM : 0.101
Total standard error : 242.2565
Total coefficient of variation : 0.1205
# plot
pcount_cv_tw <- ggplot() +
grid_plot_obj(pred.grid, sqrt(mod_tw_var$pred.var)/unlist(mod_tw_var$pred), "CV") +
scale_fill_gradient(low = "white", high = "chocolate4") +
coord_equal() + theme_minimal() +
geom_path(aes(x=x, y=y),data=survey.area) +
geom_point(aes(x=Xcoord, y=Ycoord, group="Point"), data=DSdata, colour="black")
library("gridExtra")
grid.arrange(pcount_tw, pcount_cv_tw, ncol=2)
This is a small example of what can be done using DSM and the simplest covariates available: geographical coordinates.
Mauvy, B., Souchay, G., Arnauduc, J.-P., & Guitton, J.-S. (2017). The french hare network : A tool to study european hare populations dynamics on a large scale. 33rd iubg congress & 14th perdix symposium. Retrieved from http://iugb2017.com/wp-content/uploads/2017/08/abstract-book-FINAL-VERSION.pdf
Miller, D. L., Burt, M. L., Rexstad, E. A., & Thomas, L. (2013). Spatial models for distance sampling data: Recent developments and future directions. Methods in Ecology and Evolution, 4, 1001–1010. https://doi.org/10.1111/2041-210X.12105
Peroux, R., Mauvy, B., Lartiges, A., Bray, Y., & Marboutin, E. (1997). Point transects sampling: A new approach to estimate densities or abundances of european hare (lepus europaeus) from spotlight counts. Game and Wildlife, 14, 525–529.
R Core Team. (2019). R: A language and environment for statistical computing. Vienna Austria: R Foundation for Statistical Computing.