--- title: "Scallop Conditional Logit Model Example" author: "Bryce McManus" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: fig_width: 6 fig_height: 4 vignette: > %\VignetteIndexEntry{Scallop Conditional Logit Model Example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} editor_options: markdown: wrap: 72 --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Introduction This is an example of a conditional logit model using the `scallop` data from the `FishSET` package. ### Packages ```{r} library(FishSET) ``` ```{r eval=TRUE, echo=FALSE} # this chunk is for vignette version only folderpath <- tempdir() ```

### Load Data This analysis uses three of `FishSET`'s example datasets: the `scallop` dataframe which contains anonymized scallop data from the Northeast, `scallop_ports` which is a table of ports and their location, and `tenMNSQR` which is a spatial dataframe of Northeastern ten minute squares. The `scallop` dataframe contains a random sample of 10,000 trips from vessels in the Limited Access Days-at-Sea fleet when they are declared into either an Access Area or Open area Days-at-Sea fishing trip. Noise was added to fishing locations, landing quantities, and the value of catch. Permit, Operator, and trip identifiers were also anonymized. ```{r} data("scallop") data('scallop_ports') data("tenMNSQR") scallop$LANDED_thousands<-scallop$LANDED_OBSCURED/1000 scallop$DOLLAR_2020_thousands<-scallop$DOLLAR_2020_OBSCURED/1000 vars_to_keep <- c('TRIPID', 'PERMIT.y', 'DATE_TRIP', 'DDLON', 'DDLAT', 'ZoneID', 'LANDED_thousands', 'DOLLAR_2020_thousands', 'port_lon', 'port_lat', 'previous_port_lon', 'previous_port_lat') scallop <- scallop[vars_to_keep] ```

Load each dataset into FishSET. Rescale the Landed weight to thousands of meat pounds and the Dollar value to thousands of Real dollars. Note: when running this chunk you may get a pop-up asking to identify the location for a new FishSET folder or an existing folder. ```{r} load_maindata(scallop, project = "scallopMod", over_write = TRUE) load_port(dat = scallop_ports, port_name = "port_name", project = "scallopMod") load_spatial(spat = tenMNSQR, name = "TenMnSqr", project = "scallopMod") scallopModTenMnSqrSpatTable <- table_view('scallopModTenMnSqrSpatTable', "scallopMod") ```

### QAQC `summary_stats()` is a useful way to find `NA`s in the data. ```{r} summary_stats(scallopModMainDataTable, project = "scallopMod") ```

Remove `NA`s by calling `na_filter()`. In this case, three variables which are important for modeling are being filtered: `ZoneID`, `previous_port_lon`, and `previous_port_lat`. ```{r} scallopModMainDataTable <- na_filter(scallopModMainDataTable, project = "scallopMod", x = c("ZoneID", "previous_port_lon", "previous_port_lat"), remove = TRUE) ```

Plot the number of observations by zone. ```{r} zone_summary(dat = scallopModMainDataTable, spat = scallopModTenMnSqrSpatTable, project = "scallopMod", zone.dat = "ZoneID", zone.spat = "TEN_ID", output = "tab_plot") ```

Check for sparsity. ```{r} sparsetable(scallopModMainDataTable, 'scallopMod', timevar = 'DATE_TRIP', zonevar = 'ZoneID', var = 'LANDED_thousands') sparsplot('scallopMod') ```

### Create Centroids A centroid table is needed to create the distance matrix. It can be used as the choice occasion or the alternative choice. The simplest way to create a zonal centroid table is by passing it to `create_centroid()`, which saves the centroids to the FishSET database. ```{r} create_centroid(spat = scallopModTenMnSqrSpatTable, project = "scallopMod", spatID = "TEN_ID", type = "zonal centroid", output = "centroid table") ```

### Alternative Choice For this example, the alternative choice list will use the longitude and latitude of the disembarking port (`previous_port_lon` and `previous_port_lat`) as the choice occasion and the zonal centroid of the fishing areas as the alternative. The minimum haul haul requirement is set to 90. ```{r} create_alternative_choice(dat = scallopModMainDataTable, project = "scallopMod", occasion = "lon-lat", occasion_var = c("previous_port_lon", "previous_port_lat"), alt_var = "zonal centroid", zoneID = "ZoneID", zone.cent.name = "scallopModZoneCentroid", min.haul = 90 ) ```

The plot below visualizes zone frequency after accounting for the minimum haul requirement from the alternative choice list. ```{r} z_ind <- which(alt_choice_list('scallopMod')$dataZoneTrue == 1) zOut <- zone_summary(scallopModMainDataTable[z_ind, ], spat = scallopModTenMnSqrSpatTable, project = "scallopMod", zone.dat = "ZoneID", zone.spat = "TEN_ID", output = "tab_plot") pretty_tab(zOut$tab) zOut$plot ```

### Expected Catch This code chunk creates two different expected catch matrices: one using a window of seven days, lag of one and a window of 14 days, lag of two. They will be named `user1` and `user2` respectively. ```{r} # user1 expected catch matrix create_expectations(dat = scallopModMainDataTable, project = "scallopMod", catch = "LANDED_thousands", temp.var = "DATE_TRIP", temp.window = 7, temp.lag = 1, year.lag = 0, temporal = 'daily', empty.catch = NA, empty.expectation = 1e-04, default.exp = FALSE, replace.output = TRUE) # user2 expected catch matrix create_expectations(dat = scallopModMainDataTable, project = "scallopMod", catch = "LANDED_thousands", temp.var = "DATE_TRIP", temporal = "daily", temp.window = 14, temp.lag = 2, empty.catch = NA, empty.expectation = 1e-04, default.exp = FALSE, replace.output = FALSE) ```

The data must be checked for common data quality issues before it can be used in the modeling functions (i.e. `make_model_design()` and `discretefish_subroutine()`). `check_model_data()` saves a new version of the primary data with the suffix `_final` added to indicate that the table is in its "final" state and ready to be used for modeling. ```{r} check_model_data(scallopModMainDataTable, project = "scallopMod", uniqueID = "TRIPID", latlon = c("DDLON","DDLAT")) ```

### Model Design The model design file below will run two conditional logit models, each using one of the expected catch matrices created earlier (this is specified by using `'individual'` in the `expectcatchmodels` argument). ```{r} make_model_design(project = "scallopMod", catchID = "LANDED_thousands", likelihood = "logit_c", initparams = c(0, 0), vars1 = NULL, vars2 = NULL, mod.name = 'lz', expectcatchmodels = list('individual') ) ```

### Run Models Use `discretefish_subroutine()` to run all models in the model design file. ```{r} discretefish_subroutine(project = "scallopMod", explorestarts = FALSE) ```

Use `model_params()` to see the model output. `user1` and `user2` are the expected catch parameters and `V1` the travel distance parameter. A reasonably specified model should find positive coefficients for `user1` and `user2` and negative coefficiencts for `V1`. ```{r} model_params("scallopMod", output = 'print') ```

Compare model fit. ```{r} model_fit_summary("scallopMod") ``` ```{r eval=TRUE, echo=FALSE} # for vignette version only unlink(folderpath) ```