```{r echo=FALSE}
library(knitr)
opts_chunk$set(message=FALSE, echo=TRUE, error=FALSE, warning=FALSE)
```

In France, the French National Agency for Hunting and Wildlife has recently created the Hare network [@Mauvy2017], a network of voluntary sites where hares and foxes are sampled every year in February using point transect sampling [@Peroux1997].

As an illustration, we used the sampling data from Rouillacais for hare only, in western part of France. At this site, ~ 50 points were sampled during 3 successive nights in February of 2015 and 2016, distance from observers to hares and foxes were recorded. Presence of other species was also noted. Fieldwork was undertaken by skilled observers only. Data for the two years are pooled to produce a single estimate that represents the average density for the two years.

*The material of this analysis is provided for illustration purpose only. For other use or questions regarding the Hare network, feel free to contact us at reseau.lievre@oncfs.gouv.fr*

*Place all following files in same directory as .Rmd*

- [Point transect survey csv](Hare_data.csv)
- [Study area shapefile shp](Contour_Rouillacais.shp)
- [Study area shapefile shx](Contour_Rouillacais.shx)
- [Study area shapefile dbf](Contour_Rouillacais.dbf)
- [Point station shapefile shp](Rouillacais_points.shp)
- [Point station shapefile shx](Rouillacais_points.shx)
- [Point station shapefile dbf](Rouillacais_points.dbf)


## Observation data and covariates

First, we load point transect survey data into `R` [@r_core_team_r_2019].

```{r data}
# load raw data
data <- read.table("Hare_data.csv", header = TRUE, sep = ";", stringsAsFactors=FALSE)
str(data)
```

Then, we build our dataset for the detection function.

```{r Dataset}
DSdata <- data.frame(Sample.Label = sub("Rouillacais_2016", "", data$point_ID),
                     Point = data$point_ID,
                     Xcoord = as.integer(data$Xcoord),
                     Ycoord = as.integer(data$Ycoord),
                     Area = 1)
```


Distance are given in metres, so density would be estimated as the number of individuals per square metre. To avoid this, we convert metres into kilometres.

```{r tokm}
DSdata$distance <- as.numeric(data$distance)/1000
```

Note that the `Effort` column contains `3`, as each point was visited 3 times.  It is worth noting that in our study, we are not interested in abundance but in density only.

```{r effort}
DSdata$Effort <- 3
```

### GIS data

We now load and format the spatial data from shapefiles. Note there is a shape file for the study area (Contour_Rouillacais) and the point station locations (Rouillacais_points).  These shape files reside in the same directory as the point survey data and Rmarkdown file.

```{r map_data}
library("rgdal")
library("maptools")
library("ggplot2")
library("plyr")

# provide the correct projection for the data
newproj <- "+proj=lcc +nadgrids=ntf_r93.gsb,null +a=6378249.2000 +rf=293.4660210000000  +pm=2.337229167 +lat_0=46.800000000 +lon_0=0.000000000 +k_0=0.99987742 +lat_1=46.800000000 +x_0=600000.000 +y_0=200000.000 +units=m +no_defs"
# import shapefile for the survey area
shape <- readShapeSpatial("Contour_Rouillacais.shp", proj4string = CRS(newproj),
                          repair=TRUE, force_ring=T, verbose=TRUE)
# import shapefile for the points
EPP <- readShapeSpatial("Rouillacais_points.shp", proj4string = CRS(newproj),
                        repair=TRUE, force_ring=T, verbose=TRUE)
# make the object simpler
survey.area <- data.frame(shape@polygons[[1]]@Polygons[[1]]@coords)
names(survey.area) <- c("x","y")
```

We can find the area of the study area (in km$^2$):

```{r area}
shape@data$AREA/(1000^2)
```

We can then produce the map of the area with the sampled points (Figure \@ref(fig:map1)).

```{r map1, fig.cap="Map with sampled points"}
# produce a map of the survey area with all the point sampled
p <- qplot(data=survey.area, x=x, y=y, geom="polygon",
           fill=I("lightblue"), ylab="y", xlab="x", alpha=I(0.7)) +
  geom_point(aes(x=Xcoord, y=Ycoord, group="Point"),
             data=DSdata, colour="darkblue") +
  coord_equal() +
  theme_minimal()
print(p)

```

Setting up the segment data, which in our case are the points that were visited...

```{r segdata}
# construct segment (point) data (x, y, Effort, Sample.Label)
segdata <- as.data.frame(matrix(NA, ncol = 5, nrow=100))
segdata <- DSdata[, c("Sample.Label", "Effort", "Point", "Xcoord", "Ycoord")]
segdata <- segdata[!duplicated(segdata), ]
colnames(segdata) <- c("Sample.Label", "Effort", "Segment.Label", "X", "Y")
```

Setting up the observation data, which links the observations with the segments (points):

```{r obsdata}
obsdata <- DSdata
obsdata$size <- 1
obsdata$object <- 1:nrow(obsdata)
str(obsdata)
```


We then create the prediction grid (Figure \@ref(fig:projgrid)).

```{r projgrid, fig.cap="Prediction grid"}
# create a prediction grid
# method from http://rfunctions.blogspot.co.uk/2014/12/how-to-create-grid-and-intersect-it.html
library("raster")
library("rgeos")
library("dismo")

# Create an empty raster
grid <- raster(extent(shape))
# Choose its resolution. 500 m in both X and Y (truncation distance)
res(grid) <- 500
# Make the grid have the same coordinate reference system (CRS) as the shapefile.
proj4string(grid) <- proj4string(shape)
# Transform this raster into a polygon and you will have a grid
gridpolygon <- rasterToPolygons(grid)
# Intersect our grid with shape
pred.grid <- intersect(shape, gridpolygon)
# Plot the intersected shape to check if everything is fine.
plot(pred.grid)

# create the data.frame for prediction
preddata <- as.data.frame(matrix(NA, ncol=3, nrow=dim(pred.grid@data)[1]))
colnames(preddata) <- c("X", "Y", "area")
for (i in 1:dim(pred.grid@data)[1]){
  preddata[i, c("X", "Y")] <- pred.grid@polygons[[i]]@labpt
  preddata[i, c("area")] <- pred.grid@polygons[[i]]@area/(1000^2)
}
```

The size of each cell in the prediction grid is 0.25 km$^2$ (created in metres and converted to kilometres).

## Detection function

Detection functions can be fitted using the `Distance` R package:

```{r det_function}
library("Distance")
# define distance bins
cut <- c(0, 0.180, 0.220, 0.280, 0.300)

df_ht <- ds(DSdata, truncation=0.3, transect="point",
            formula=~1, key="hn", adjustment=NULL, cutpoints=cut)
```

We can look at a plot of the detection function and results from goodness of fit testing (Figure \@ref(fig:gof)).

```{r gof, fig.cap="Detection function:  half-normal."}
plot(df_ht, pdf=TRUE)
gof_ds(df_ht)
```

We can see from the above, the model fit seems adequate.

We can now fit some DSMs [@Miller2013]:


## Density surface modelling

```{r dsmload}
library("dsm")
```

Now fitting the DSMs, we model the count as a function of space, using a Tweedie (`tw()`) response distribution:

```{r dsm-fitting}
mod_tw <- dsm(count~s(X, Y), ddf.obj=df_ht, segment.data=segdata, 
              observation.data=obsdata, family=tw(), transect="point")
```

Note that we need to specify `transect="point"` to ensure that the effort is calculated properly.

We can then look at the `summary` and model checking output (Figure \@ref(fig:gamcheck)).

```{r gamcheck, fig.cap="Results from `gam.check`", layout="l-body-outset"}
summary(mod_tw)
gam.check(mod_tw)
```

This model is fairly rudimentary, so the plot of response vs fitted values doesn't seem that great (note the difference in axis scales) but for illustration purposes here we accept what we have.

## Making predictions

We can now predict over the prediction grid.

```{r makepred}
mod_tw_pred <- predict(mod_tw, preddata, preddata$area)
```

Here we define a convenience function to generate an appropriate data structure for `ggplot2` to plot: given the argument `fill` (the covariate vector to use as the fill) and a name, return a `geom_polygon` object  (fill must be in the same order as the polygon data).


```{r plotpred}
grid_plot_obj <- function(shape,fill, name){
  
  # what data were supplied?
  names(fill) <- NULL
  row.names(fill) <- NULL
  data <- data.frame(fill)
  names(data) <- name
  
  # ! need to give the right name of the shapefile
  sp <- shape
  spdf <- SpatialPolygonsDataFrame(sp, data)
  spdf@data$id <- rownames(spdf@data)
  spdf.points <- fortify(spdf, region="id")
  spdf.df <- join(spdf.points, spdf@data, by="id")
  
  # store the x/y even when projected and labelled as "long" and "lat"
  spdf.df$x <- spdf.df$long
  spdf.df$y <- spdf.df$lat
  
  geom_polygon(aes_string(x="x",y="y",fill=name, group="group"), data=spdf.df)
}

# make the plot
pcount_tw <- ggplot() +
  grid_plot_obj(pred.grid, mod_tw_pred, "Density") + 
  scale_fill_gradient(low="white", high="chocolate4")  +
  coord_equal() + theme_minimal() +
  geom_path(aes(x=x, y=y), data=survey.area) +
  geom_point(aes(x = Xcoord, y = Ycoord, group="Point"), data = DSdata, colour = "black") +
  labs(fill="Density")
```

We can also estimate uncertainty in our abundance map in the form of a map of coefficients of variation. Note that since there are no covariates in the detection function, we use the `dsm.var.gam` function to estimate the variance (if there were covariates varying at the segment level, such as sea state or observer, we could use `dsm.var.prop`)  (Figure \@ref(fig:gridarr)).

```{r cvpred}
# data setup for plotting
preddata.var <- split(preddata, 1:nrow(preddata))

# estimate variance
mod_tw_var <- dsm.var.gam(mod_tw, pred.data=preddata.var, off.set=preddata$area)
summary(mod_tw_var)

# plot
pcount_cv_tw <- ggplot() + 
  grid_plot_obj(pred.grid, sqrt(mod_tw_var$pred.var)/unlist(mod_tw_var$pred), "CV") + 
  scale_fill_gradient(low = "white", high = "chocolate4") +
  coord_equal() + theme_minimal() +
  geom_path(aes(x=x, y=y),data=survey.area) +
  geom_point(aes(x=Xcoord, y=Ycoord, group="Point"), data=DSdata, colour="black")
```

```{r, gridarr, fig.cap="Map of hare density (ind/km²) and of associated coefficient of variation in Rouillacais site from the DSM model with coordinates as covariates using Tweedie response. Black dots represent the sampled point.", layout="l-page"}
library("gridExtra")
grid.arrange(pcount_tw, pcount_cv_tw, ncol=2)
```

This is a small example of what can be done using DSM and the simplest covariates available: geographical coordinates.