---
title: "Fitting models with the bycatch package"
author: "Eric J. Ward, Jason E. Jannot"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Fitting models with the bycatch package}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r, results="hide", message=FALSE, warning=FALSE}
library(bycatch)
set.seed(123)
```

### Load data
```{r data}
# replace this with your own data frame
d = data.frame("Year"= 2002:2014, 
  "Takes" = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 0),
  "expansionRate" = c(24, 22, 14, 32, 28, 25, 30,  7, 26, 21, 22, 23, 27),
  "Sets" = c(391, 340, 330, 660, 470, 500, 330, 287, 756, 673, 532, 351, 486))
```

## Simple model with constant bycatch, no covariates

We'll start by fitting a model with constant bycatch rate,  
```{r, results="hide", message=FALSE, warning=FALSE}
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="poisson",
  time_varying = FALSE)
```

If divergent transition warnings or other issues indicating lack of convergence are a problem, we can try changing some of the control arguments, e.g.

```{r eval=FALSE}
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="poisson",
  time_varying = FALSE, control=list(adapt_delta=0.99,max_treedepth=20))
```

We can also increase the iterations and number of chains from the defaults (1000 and 3),

```{r eval=FALSE}
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="poisson",
  time_varying = FALSE, iter=3000, chains=4)
```

### Make plots

```{r, fig.pos="placeHere", fig.cap = "Estimated bycatch for observed vessels (not expanded by observer coverage), but including observed takes and effort."}
plot_fitted(fit, xlab="Year", ylab = "Bycatch (observed vessels)")
```

We can include points also, 
```{r, fig.pos="placeHere", fig.cap = "Observed bycatch (not expanded by observer coverage), incorporating data on observed takes and effort. Dots represent observed bycatch events."}
plot_fitted(fit, xlab="Year", ylab = "Estimated bycatch for observed vessels", include_points = TRUE)
```
  
### Extracting model selection information (LOOIC)  
  
The `loo` package in R provides a nice interface for extracting leave one out information criterion (LOOIC) from `stanfit` objects. Like AIC, lower is better. Values of LOOIC can be used to compare models with the same response but different structure (covariates or not, time-varying bycatch or not, etc). Additional information on LOOIC can be found at [mc-stan.org](http://mc-stan.org/rstanarm/reference/loo.stanreg.html), [Vehtari et al. 2017](https://link.springer.com/article/10.1007/s11222-016-9696-4), or the vignette for the [loo package](https://cran.r-project.org/web/packages/loo/vignettes/loo2-example.html).

```{r}
loo::loo(fit$fitted_model)$estimates
```

\break 

## Negative binomial example

Using our example dataset above, we can also switch from the Poisson model to Negative Binomial. 

```{r, results="hide", message=FALSE, warning=FALSE}
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="nbinom2",
  time_varying = FALSE)
```

The degree of overdispersion here is stored in the variable `nb2_phi`, which we can get with

```{r}
phi = rstan::extract(fit$fitted_model)$nb2_phi
```

## Zero - inflated example

To accomodate excess zeros, we've included hurdle models that separately model the probability of zeros, and use a zero-truncated model for the positive component. We currently have 2 versions of the zero-inflated model included -- a Poisson, and Negative Binomial. These can be fit using

```{r, results="hide", message=FALSE, warning=FALSE, eval = FALSE}
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="poisson-hurdle",
  time_varying = FALSE)
```

and 

```{r, results="hide", message=FALSE, warning=FALSE, eval = FALSE}
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="nbinom2-hurdle",time_varying = FALSE)
```

## Continuous models

As above, we can construct the models to be zero-inflated or not. Three different families are currently included ("normal", "gamma", "lognormal"). These can be fit as follows, 

```{r, results="hide", message=FALSE, warning=FALSE, eval = FALSE}
fit_normal = fit_bycatch(Takes ~ 1,
    data=d, 
    time="Year", 
    effort="Sets",
    family="normal",
    time_varying = FALSE)

fit_lognormal = fit_bycatch(Takes ~ 1, 
    data=d, 
    time="Year", 
    effort="Sets", 
    family="lognormal",
    time_varying = FALSE)

fit_gamma = fit_bycatch(Takes ~ 1, 
    data=d,
    time="Year",
    effort="Sets",
    family="gamma",
    time_varying = FALSE)
```

And similarly to the discrete models, we can fit zero-inflated models by changing the models to the hurdle equivalents

```{r, results="hide", message=FALSE, warning=FALSE, eval = FALSE}
fit_normal = fit_bycatch(Takes ~ 1,
      data=d,
      time="Year",
      effort="Sets",
      family="normal-hurdle",
      time_varying = FALSE)

fit_lognormal = fit_bycatch(Takes ~ 1,
      data=d,
      time="Year",
      effort="Sets",
      family="lognormal-hurdle",
      time_varying = FALSE)

fit_gamma = fit_bycatch(Takes ~ 1,
      data=d,
      time="Year",
      effort="Sets",
      family="gamma-hurdle",
      time_varying = FALSE)
```

## Example with covariates

Following [Martin et al. 2015](http://onlinelibrary.wiley.com/doi/10.1890/14-0059.1/abstract) we can include fixed or continuous covariates. 

For example, we could include a julian day, and a break point (representing a regulatory change for example) in the data. We could model the first variable as a continuous predictor and the second as a factor.

```{r}
d$Day = sample(seq(220,280),size=nrow(d),replace=T)
d$Reg = ifelse(d$Year < 2008, 0, 1)
```

Using the formula interface makes it easy to include covariates (in general it's a good idea to standardize these)

```{r message=FALSE, warning=FALSE,results="hide"}
d$Day = (d$Day - mean(d$Day)) / sd(d$Day)

fit = fit_bycatch(Takes ~ Day + Reg, data=d, time="Year", effort="Sets", family="poisson",
  time_varying = FALSE)
```

We can get the 3 covariate effects out with the following call:
```{r message=FALSE, warning=FALSE,results="hide"}
betas = rstan::extract(fit$fitted_model)$beta
```

Note that 'betas' has 3 columns. These correspond to (1) the intercept,
(2) continuous predictor, and (3) factor variable above. If we didn't 
include the covariates, we'd still estimate beta[1] as the intercept.

\break 

## Fit model with time-varying effects

To incorporate potential autocorrelation, we can fit a model with time-varying random effects. This is equivalent to a dynamic linear model with time varying intercept in a Poisson GLM. 

```{r, results="hide", message=FALSE, warning=FALSE}
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", 
    effort="Sets", family="poisson",
    time_varying = TRUE)
```

```{r, fig.pos="placeHere", fig.cap = "Estimated bycatch from the model with time-varying effects, incorporating data on takes, effort, and observer coverage. Dots represent observed bycatch events."}
plot_fitted(fitted_model=fit, xlab="Year", 
            ylab = "Fleet-level bycatch", include_points = TRUE)
```

\break 

## Fit model with no bycatch events

```{r data2}
# replace this with your own data frame
d = data.frame("Year"= 2002:2014, 
  "Takes" = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
  "expansionRate" = c(24, 22, 14, 32, 28, 25, 30,  7, 26, 21, 22, 23, 27),
  "Sets" = c(391, 340, 330, 660, 470, 500, 330, 287, 756, 673, 532, 351, 486))
```

```{r warning=FALSE,message=FALSE,results='hide'}
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", 
                  effort="Sets", family="poisson",
  time_varying = FALSE)
```

```{r fig.pos="placeHere", fig.cap = "Estimated bycatch from the dataset with no events, incorporating effort, and observer coverage. Dots represent observed bycatch events."}
plot_fitted(fitted_model=fit, xlab="Year", 
            ylab = "Fleet-level bycatch", 
            include_points = TRUE)
```