--- title: "Environmental linkages and priors: a formula-driven API" output: rmarkdown::html_vignette vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Environmental linkages and priors: a formula-driven API} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Overview Environmental drivers (temperature, prey density, climate indices) enter stock assessments through recruitment, natural mortality, growth, catchability, and selectivity. Rceattle exposes a **long-format linkage table**: one row per estimated coefficient tying an environmental term (or any user-specified design column) to a process parameter. Users describe linkages with `linkage_spec()` -- a formula-driven helper -- and pass the result to the relevant `build_*()` function. `fit_mod()` pools every spec into a single `data.frame` paired with a shared design matrix; the TMB template iterates the table once and adds per-process offsets to the underlying linear predictors. The linkage path is wired for **growth** (von Bertalanffy and Richards), **natural mortality** (`M1`), and **recruitment** (`R0`, `alpha`, `beta`). ## A first example: temperature on `K` Suppose we want sex- and species-specific von Bertalanffy growth, with the von Bertalanffy `K` parameter responding to a temperature index by species: ```{r} library(Rceattle) data(whamGrowthData) plot_data(whamGrowthData) # Use logistic selectivity for all fleets in this example. whamGrowthData$fleet_control$Selectivity <- "Logistic" whamGrowthData$fleet_control$Selectivity_dimension <- "Length" # 1. Attach an env_data table to the data list. Must include `Year` # plus whatever covariates show up in your formulas. Here `temp` is # a placeholder; in real use it would be a standardised SST series, # PDO index, etc. yrs <- whamGrowthData$styr:whamGrowthData$endyr whamGrowthData$env_data <- data.frame( Year = yrs, temp = rnorm(length(yrs), 0, 0.1) ) # 2. Build a growth specification with one linkage per parameter that # you want to vary. linkage_spec() captures (formula, by, link, # init, bounds, priors) for one process parameter. growth_spec <- build_growth( fun = "vonBertalanffy", linkages = list( K = linkage_spec( formula = ~ temp, # 2 columns: (Intercept) and `temp` by = ~ species + sex, # one beta per (species, sex, column) priors = list(temp = normal(0, 0.1)) # Tight prior ) ) ) # 3. Fit as usual. fit <- fit_mod( data_list = whamGrowthData, growthFun = growth_spec, estimateMode = 0, msmMode = 0, fit_control = fit_control(verbose = 0) ) summary(fit) ``` ## The `linkage_spec()` API ```r linkage_spec( formula, param = NULL, by = ~ species, species = NULL, sex = NULL, link = "log", # or "identity" init = NULL, bounds = NULL, priors = NULL, re_group = NA_character_, est_phase = 1L ) ``` Key arguments (see `?linkage_spec` for the full list): * **`formula`** -- one-sided R formula describing the linear predictor (anything `model.matrix()` understands: `~ 1`, `~ temp`, `~ temp + PDO`, `~ poly(temp, 2)`, factor predictors, etc.). * **`param`** -- target parameter name (e.g. `"K"`). Filled in automatically when the spec is registered under a `linkages = list(K = ...)` list key. * **`by`** -- stratifying factors (default `~species`); use `~species + sex` for per-(species, sex) coefficients, or `NULL` to share a single coefficient set. * **`species`** / **`sex`** -- restrict a spec to specific ids. Use with the multi-spec form (*Per-species* / *Per-sex formulas* below) to give different species or sexes different formulas. `sex` accepts integers (`1L`, `2L`) or strings (`"Females"`/`"Males"`, case-insensitive). * **`link`** -- relates the linear predictor to the natural-scale target. `"log"` (default) gives a multiplicative effect on natural `param`; `"identity"` gives an additive effect on natural `param`. * **`init`** / **`bounds`** -- named lists keyed by design-matrix column (`init = list(temp = 0.05)`, `bounds = list(temp = c(-2, 2))`). * **`priors`** -- named list keyed by design-matrix column. Each entry is either an `Rceattle_prior` or a list keyed by species id; see *Priors* below. * **`re_group`** / **`est_phase`** -- random-effects grouping (placeholder) and estimation phase ordinal (`0` fixes, `1+` estimates). ## Priors Four families are available, plus the implicit "none": ```r prior_normal(mean, sd) prior_lognormal(meanlog, sdlog) prior_gamma(shape, rate) prior_beta(shape1, shape2) ``` Inside `linkage_spec(priors = ...)` the unprefixed shorthand `normal()`, `lognormal()`, `gamma()`, `beta()` resolves to the corresponding `prior_*` constructors via a private data mask -- this is scoped to the `priors` argument and does not mask `base::gamma()` / `base::beta()` elsewhere: ```{r} linkage_spec( formula = ~ temp + PDO, by = ~ species, priors = list( temp = normal(0, 1), # informative, ~95% in (-2, 2) PDO = normal(0, 5) # weak ) ) ``` **Prior semantics.** For `(Intercept)` rows, Priors are evaluated on the *natural-scale* (e.g. `normal(0.3, 0.1)` on `K` means `dnorm(K_natural, 0.3, 0.1)`). Use `lognormal(meanlog, sdlog)` to put a normal directly on the log scale. For covariate rows, the prior applies to the coefficient as-is regardless of link function. For programmatic prior assembly (e.g. species-specific priors built in a loop) call the `prior_*` constructors directly: ```{r} priors <- lapply(species_specific_sds, function(s) prior_normal(0, s)) names(priors) <- c("temp", "PDO", "salinity") linkage_spec(formula = ~ temp + PDO + salinity, priors = priors) ``` Each prior's negative log-density contributes to slot 20 of the joint NLL (`fit$quantities$jnll_comp[20, ]` in R): ```{r} fit$quantities$jnll_comp["Linkage-table priors", ] ``` ## Putting a prior on the parameter itself Bayesian-style informative priors can be applied directly to the base growth parameters (`K`, `L1`, `Linf`, `m`), natural mortality (`M1`), or recruitment parameters (`R0`, `alpha`, `beta`). A prior attached to an `(Intercept)` row is **re-targeted to the base parameter**. The same syntax is used for any other column. ```{r} # Example: prior on K for von Bertalanffy growth growth_spec <- build_growth( fun = "vonBertalanffy", linkages = list( K = linkage_spec( formula = ~ 1, # intercept only -- no env effects by = ~ species + sex, init = list(`(Intercept)` = 0.3), # start at K = 0.3 # Use `lognormal(log(0.3), 0.1)` to get a normal prior on log(K). priors = list(`(Intercept)` = lognormal(log(0.3), 0.1)) ) ) ) ``` The same pattern works for any linkage target -- `Linf`, `L1`, `m` (Richards), `M1`, `R0`, `alpha`, `beta`. ### Growth SD endpoints The standard deviations of length-at-age at the `L1` and `Linf` anchors (`sd_L1`, `sd_Linf`) are exposed as linkage targets too, so the same `init` / `bounds` / `priors` contract applies. They route onto the `growth_log_sd` parameter rather than `log_growth_pars`, and -- because `growth_log_sd` has no year dimension in `growth.hpp` -- only intercept-bearing formulas are honored. Slope rows on these targets raise a warning and have no effect; slope-only formulas (`~ 0 + temp`) error. ```{r} # Prior on the SD at L1 anchored at exp(log(5)) = 5 length units, # with a tight normal prior on the log scale. growth_spec_sd <- build_growth( fun = "vonBertalanffy", linkages = list( sd_L1 = linkage_spec( formula = ~ 1, by = ~ species + sex, init = list(`(Intercept)` = 5), # Prior on log-scale SD; switch to `normal(5, 0.3)` # for a Normal on sd_L1. priors = list(`(Intercept)` = lognormal(log(5), 0.3)) ), sd_Linf = linkage_spec( formula = ~ 1, by = ~ species + sex, init = list(`(Intercept)` = 15) ) ) ) # After fit_mod(), the init values land on growth_log_sd[, , 1:2] # (SD at L1 and SD at Linf, log-scale), and the priors evaluate # against those base-parameter cells via the intercept re-targeting # path described above. ``` You can also key the prior by species id for species- specific priors: ```{r} rec_spec <- build_srr( srr_fun = 0, linkages = list( R0 = linkage_spec( formula = ~ 1, by = ~ species, init = list(`(Intercept)` = 1000), # Species-keyed priors on log-scale R0. priors = list(`(Intercept)` = list( `1` = lognormal(log(1000), 0.2), `2` = lognormal(log(1500), 0.3), `3` = lognormal(log(800), 0.25) )) ) ) ) ``` The prior contribution appears in slot 20 of the joint NLL: ```{r} fit$quantities$jnll_comp["Linkage-table priors", ] ``` You can inspect the linked parameter values via the linkage table, the estimated slope coefficients, and the base-parameter array: ```{r} fit$data_list$linkage_table[fit$data_list$linkage_table$param == "K", ] fit$estimated_params$beta_linkage # (Intercept) rows are 0; slopes are estimated fit$estimated_params$log_growth_pars # base K / L1 / Linf / m (on log scale internally) ``` The same approach works for **`M1`** when you want a prior on natural mortality itself: ```{r} data("GOApollock") # Using GOA pollock as an example # One-species model, lognormal(log(0.34), 0.01) prior on M1 # applied via the linkage intercept (equivalent to a Normal on log_M1). # The structural M1_model is kept at "sex_age_invariant" (= 1). M1_spec <- build_M1( M1_model = "sex_age_invariant", linkages = list( M1 = linkage_spec( formula = ~ 1, by = ~ species, init = list(`(Intercept)` = 0.34), # Prior is on the natural-scale for M1. # `lognormal(log(0.34), 0.01)` evaluates Normal on log(M1). # Use `normal(0.34, 0.01)` for a Normal directly on natural M1. priors = list(`(Intercept)` = lognormal(log(0.34), 0.01)) ) ) ) # Fit the model - no manual mapping needed fit <- fit_mod( data_list = GOApollock, M1Fun = M1_spec, estimateMode = 0, fit_control = fit_control(phase = TRUE) ) ``` The linkage prior is applied verbatim, with no lognormal bias-correction term. ## Species- and (species, sex)-specific priors Each entry under `priors = list(...)` may be: * a single prior -- applies to every species/sex row that uses that design column, * a **named list of priors keyed by species id** -- one prior per specified species, or * a **named list of priors keyed by species id and sex** -- one prior per specified species and sex, or * a **two-level nested list keyed first by specified species, then by specified sex id**. ```{r} # Species-only keying linkage_spec( formula = ~ temp, by = ~ species, priors = list( temp = list( `1` = normal(0, 0.1), # tight prior for species 1 #`2` = normal(0, 0.5), # No prior species 2 `3` = normal(0, 1.0) # weakest for species 3 # any species not listed gets no prior ) ) ) ``` For sexually dimorphic stocks, the same `temp` column can take different priors per (species, sex). At each level the value can be either an `Rceattle_prior` (applies to all sub-strata) or a list that drills down further: ```{r} linkage_spec( formula = ~ temp, by = ~ species + sex, priors = list( temp = list( # Species 1: different tightness per sex. `1` = list(`1` = normal(0, 0.1), # sp 1, sex 1 (e.g. females) `2` = normal(0, 0.2)), # sp 1, sex 2 (e.g. males) # Species 2: scalar applies to every sex. `2` = normal(0, 0.5) # Any (species, sex) combination not listed gets no prior. ) ) ) ``` The same syntax works for any prior family. This is the recommended pattern when a covariate has different prior strength across stocks (e.g. when one species has prior literature on its temp sensitivity but another doesn't), or when within a species the sexes are expected to respond differently. Built programmatically, it's just: ```{r} sd_by_species <- c(`1` = 0.1, `2` = 0.5, `3` = 1.0) priors_per_sp <- lapply(sd_by_species, function(s) prior_normal(0, s)) linkage_spec( formula = ~ temp, by = ~ species, priors = list(temp = priors_per_sp) ) ``` ## Per-species formulas If species have qualitatively different env relationships -- e.g. species 1 responds only to temperature, species 2 also to PDO -- register *multiple specs against the same parameter* and use the `species` argument on each to scope it to a subset of stocks: ```{r} build_growth( fun = "vonBertalanffy", linkages = list( K = list( linkage_spec( formula = ~ temp, by = ~ species, species = 1L, # only sp 1 priors = list(temp = normal(0, 0.5)) ), linkage_spec( formula = ~ temp + PDO, by = ~ species, species = 2L, # only sp 2 priors = list(temp = normal(0, 1), PDO = normal(0, 1)) ) ) ) ) ``` Linkages can accepts both forms under each parameter key: ```r linkages = list(K = single_spec) # one formula for all sp linkages = list(K = list(spec_a, spec_b)) # different formulas per sp ``` Inspecting the result, `fit$linkages$K` is either an `Rceattle_linkage_spec` or a `list` of them, and `fit$data_list$linkage_table` shows the materialized rows with `species` and `X_col`. This pattern composes with species-specific *priors* -- each spec carries its own priors, and within a spec the priors can themselves be species-keyed if you want even finer control: ```{r} linkage_spec( formula = ~ temp, by = ~ species + sex, species = c(1L, 2L), priors = list( temp = list( `1` = normal(0, 0.3), `2` = normal(0, 0.7) ) ) ) ``` ## Per-sex formulas The `sex` argument on `linkage_spec()` mirrors `species`: it scopes a spec to a subset of sex ids. Combined with the multi-spec list form, this lets you register *separate specs per sex* against the same parameter -- typical use cases are sex-dimorphic M priors or distinct env responses for females vs. males: ```{r} # Female and male M1 each get their own intercept-only prior. M1_spec <- build_M1( M1_model = "sex_specific", linkages = list( M1 = list( linkage_spec( formula = ~ 1, by = ~ species + sex, sex = "Females", # or sex = 1L init = list(`(Intercept)` = 0.12), priors = list(`(Intercept)` = lognormal(log(0.12), 0.0001)) ), linkage_spec( formula = ~ 1, by = ~ species + sex, sex = "Males", # or sex = 2L init = list(`(Intercept)` = 0.20), priors = list(`(Intercept)` = lognormal(log(0.20), 0.0001)) ) ) ) ) ``` The `sex` filter is a no-op when `by` does not include `sex`. If you want different *formulas* per sex (rather than just different priors or inits), use the multi-spec form -- one spec with `sex = 1L` and its own RHS, another with `sex = 2L` and a different RHS. If you only need different priors per (species, sex) for a single shared formula, the nested-priors form (see *Species- and (species, sex)-specific priors* above) is simpler -- one spec, one set of design columns, prior values keyed by `species` then `sex`. Reach for the per-sex multi-spec form when you need different RHS terms, different `init`/`bounds`, or different `est_phase` between sexes. ## Polynomial and basis-expansion formulas The formula passed to `linkage_spec()` is fed straight to [`stats::model.matrix()`](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/model.matrix.html), so any expression `model.matrix()` understands works -- polynomial bases, splines, interactions, and arithmetic transformations. **Polynomials.** Use `poly()` for orthogonal polynomial bases (the default; numerically well-conditioned), or `poly(..., raw = TRUE)` for raw monomials. Each basis order shows up as a separate design column with its own coefficient: ```{r} # 4th-order orthogonal polynomial in temp, by species. linkage_spec( formula = ~ poly(temp, 4), by = ~ species ) # design columns produced: # (Intercept), poly(temp, 4)1, poly(temp, 4)2, # poly(temp, 4)3, poly(temp, 4)4 ``` `poly()` orthogonalises the basis against the rows of `env_data`, so the columns are zero-mean over the model years and the linear slope, quadratic curvature, etc. are estimated independently -- generally the right choice if you don't have a specific scientific reason to prefer raw monomials. **Arbitrary arithmetic.** Use `I()` to insert a literal expression (otherwise `temp^2` is interpreted as a formula crossing operator and reduces to `temp`): ```{r} # Linear + raw quadratic in temp. linkage_spec( formula = ~ temp + I(temp^2), by = ~ species, priors = list(`I(temp^2)` = normal(0, 0.1)) # tight prior on curvature ) ``` The design column name is the deparsed call (`I(temp^2)`), so when you key `init`, `bounds`, or `priors` against it you have to use the same string -- backticks let you write it as a name. **Splines.** Natural cubic splines via `splines::ns()` work out-of-the-box and are often a better choice than high-order polynomials when you want flexibility without strong tail behaviour: ```{r} linkage_spec( formula = ~ splines::ns(temp, df = 4), by = ~ species ) ``` **Interactions.** `temp:PDO` adds a single product column; `temp * PDO` is shorthand for `temp + PDO + temp:PDO`: ```{r} linkage_spec( formula = ~ temp * PDO, by = ~ species, priors = list(`temp:PDO` = normal(0, 0.5)) ) ``` **Mixing forms across species.** Combined with the per-species formula machinery, you can give one species a polynomial response and another a linear one without duplicating columns: ```{r} build_growth( fun = "vonBertalanffy", linkages = list( K = list( linkage_spec(formula = ~ poly(temp, 3), by = ~ species, species = 1L), linkage_spec(formula = ~ temp, by = ~ species, species = 2L) ) ) ) ``` A practical caution: same-named columns must be numerically identical across specs that use them. With deterministic formulas evaluated against the same `env_data` this is automatic; it can only fail if you've built specs against different `env_data` frames. ## What happens internally `fit_mod()` calls `pool_linkages()` to materialise every spec against `data_list$env_data` and the species/sex/age strata, then unions the per-spec design columns into a single shared design matrix `X`. The encoded table (process / param / species / sex / age_bin / X_col / link codes, plus prior columns) and the shared `X` are stored on `data_list`. The TMB template iterates the table once per `MakeADFun` call and builds per-process offset tensors -- `growth_linkage_offset`, `M_linkage_offset`, `recruitment_linkage_offset` -- that are then combined with the base parameters at the consumer sites in `ceattle_v01_11.cpp`. With no linkages, the offsets stay at zero so existing fits are identical-by-construction. Priors and bounds flow through the same table: `build_bounds()` reads each row's `lower`/`upper` into `bounds$lower$beta_linkage` / `bounds$upper$beta_linkage`, and the cpp adds `-d(b_nat, p1, p2, log = TRUE)` to slot 19 of the joint NLL for every row whose `prior_family` is not `"none"`. ## Natural mortality `build_M1()` supports the same `linkages` argument. The valid parameter key is `M1`; the offset enters additively (on the log scale) inside the `M1_at_age` compute, so a coefficient of `0.3` on a normalised temperature index multiplies `M1` by `exp(0.3 * temp[yr])` each year. **Why `M1` and not `M`?** CEATTLE decomposes total natural mortality as `M = M1 + M2`. Only `M1` (residual / non-predation) is a parameter; `M2` (predation) is derived from predator abundance, suitability, and ration each iteration. Environmental effects on predation therefore propagate via upstream parameters (e.g. a temp effect on predator `K`, on recruitment, or on bioenergetics ration inputs). ```{r} yrs <- GOApollock$styr:GOApollock$endyr GOApollock$env_data <- data.frame( Year = yrs, temp = rnorm(length(yrs), 0, 0.1) ) M1_spec <- build_M1( M1_model = "sex_age_invariant", # or M1_model = 1 (back-compat) linkages = list( M1 = linkage_spec( formula = ~ temp, by = ~ species, priors = list(temp = normal(0, 0.1)) ) ) ) fit <- fit_mod( data_list = GOApollock, M1Fun = M1_spec, estimateMode = 0, msmMode = 0 ) ``` The structural M choices (`M1_model`) and the random-effects choices (`M1_re`) accept either integer codes or readable strings interchangeably: ```{r} build_M1(M1_model = "sex_age_specific", M1_re = "ar1_age") # = 3, 4 build_M1(M1_model = 3, M1_re = 4) # equivalent ``` `linkages` on M is age-aware: by default a row's `age_bin` is `NA` and the offset broadcasts across every age. To pin an env effect to a specific age slice, set `by = ~ species + age_bin` and supply `strata$age_bin` when calling materialize directly (`fit_mod()` provides this automatically). For most operational uses the default broadcast is what you want. The new linkage path coexists with the legacy `M1_indices = c(...)` column-index argument for `M1_model = 4` / `5`: both add to `ln_M1` on the log scale and either may be used. New code should prefer `linkages` -- it composes naturally with priors, bounds, and per-species formulas. Inspecting the result: ```{r} fit$data_list$linkage_table # rows have process == "M" fit$quantities$M_linkage_offset # [nspp, max_sex, max_age, nyrs] fit$quantities$jnll_comp["Linkage-table priors", ] ``` Growth and M linkages **share the same table** -- one fit can carry linkage rows for both processes, and the underlying coefficient vector `beta_linkage` indexes both. ## Recruitment `build_srr()` accepts the same `linkages` argument with three allowed parameter keys: * **`R0`** -- meaningful for *any* `srr_fun`; the right target if recruitment should respond to env without fitting an SRR. * **`alpha`** (productivity) and **`beta`** (density-dependence) -- only do work when `srr_fun %in% c(2, 3, 4, 5)` (Beverton-Holt or Ricker). ```{r} # Mean-recruitment with a temperature linkage on R0. rec_spec <- build_srr( srr_fun = "mean", # = 0 linkages = list( R0 = linkage_spec( formula = ~ temp, priors = list(temp = normal(0, 0.5)) ) ) ) # Beverton-Holt with env effects on alpha (and species-keyed # priors so each stock has its own sensitivity). rec_spec_bh <- build_srr( srr_fun = "BevertonHolt", # = 2 linkages = list( alpha = linkage_spec( formula = ~ temp, priors = list( temp = list(`1` = normal(0, 0.3), `2` = normal(0, 0.7)) ) ) ) ) fit <- fit_mod( data_list = data_list, recFun = rec_spec, estimateMode = 0, msmMode = 0 ) ``` The structural choice (`srr_fun`) accepts either integer codes or readable strings interchangeably: ```{r} build_srr(srr_fun = "mean") # = 0 build_srr(srr_fun = "BevertonHolt") # = 2 build_srr(srr_fun = "Ricker") # = 4 ``` Inspecting the result: ```{r} fit$data_list$linkage_table # rows have process == "recruitment" fit$quantities$recruitment_linkage_offset # [nspp, n_rec_params, nyrs] # n_rec_params = 3 (R0, alpha, beta) ``` Practical note: a linkage row's `age_bin` is ignored for recruitment (recruitment is at age 0 by definition). The `sex_ratio` data input partitions recruits across sexes downstream. ## A multi-process example The same pattern composes when more than one parameter is linked: ```{r} growth_spec <- build_growth( fun = "Richards", linkages = list( K = linkage_spec( formula = ~ temp, by = ~ species + sex, priors = list(temp = normal(0, 1)) ), Linf = linkage_spec( formula = ~ temp + PDO, by = ~ species, priors = list(temp = normal(0, 0.5), PDO = normal(0, 0.5)) ), m = linkage_spec( formula = ~ 1, # intercept only -- no env effect by = ~ species ) ) ) ``` The pooler will: * compute one shared `X = model.matrix(~ temp + PDO, env_data)` of three columns, * emit 4 (sp × sex) + 3 (Linf cols) × 3 (sp) + 3 (m intercepts) = 16 rows in the linkage table, * match same-named columns (`(Intercept)`, `temp`) across specs so there's no duplication in `X`. Per-process logic (vB has no `m`; only Richards does) is enforced at build time: ```{r} build_growth( fun = "vonBertalanffy", linkages = list(m = linkage_spec(~ temp)) ) #> Error: linkages$m is only valid with fun = 'Richards'; ... ``` Growth, M, and recruitment can all be linked in the same fit -- each `build_*()` output carries its own linkages list and `fit_mod()` pools them against a single shared design matrix: ```{r} fit <- fit_mod( data_list = data_list, growthFun = build_growth( fun = "vonBertalanffy", linkages = list( K = linkage_spec(formula = ~ temp) ) ), M1Fun = build_M1( M1_model = "sex_age_invariant", linkages = list( M1 = linkage_spec(formula = ~ temp) ) ), recFun = build_srr( srr_fun = "mean", linkages = list( R0 = linkage_spec(formula = ~ temp) ) ), estimateMode = 0, msmMode = 0 ) # Linkage table contains rows for all three processes: table(fit$data_list$linkage_table$process) #> growth M recruitment #> 4 4 4 ``` ## Inspecting and debugging A few diagnostics worth knowing: ```{r} # The materialised table -- one row per estimated coefficient. fit$data_list$linkage_table # The shared design matrix -- one column per unique RHS term. fit$data_list$linkage_X # Per-(species, sex, growth_param, year) offsets actually applied to # growth_parameters this iteration. Useful for verifying the linkage # fired as you intended. str(fit$quantities$growth_linkage_offset) str(fit$quantities$M_linkage_offset) str(fit$quantities$recruitment_linkage_offset) # Per-row coefficient values in the same order as `linkage_table`. fit$estimated_params$beta_linkage # The prior NLL contribution from slot 19 of jnll_comp. fit$quantities$jnll_comp["Linkage-table priors", ] ``` ## Behind the scenes: automatic base-parameter handling For three species × two sexes × `~ temp` (intercept + slope) the pooled table has 12 rows -- an `(Intercept)` and a `temp` slope per (species, sex). Priors and inits supplied for `(Intercept)` flow to the *base parameter* (`K`, `M1`, `R0`, ...); the slope rows hold the year-by-year offset that combines with the base. The exact behavior depends on whether the formula carries an intercept: | Formula | Base parameter | Slope rows | |------------------------------|-------------------------------------------|------------| | `~ 1` (intercept only) | estimable; init from spec or default |n/a | | `~ temp` (intercept + slope) | estimable; init from spec or default | estimated | | `~ 0 + temp` (slope only) | mapped NA, value from spec or default | estimated | **Intercept-bearing formulas (`~ 1`, `~ temp`)** -- the base parameter remains estimable; for example `R0` for species `s` lives on `rec_pars[s, 1]` exactly as it does without any linkages. The `(Intercept)` row of the table exists for bookkeeping and as a hook for `init`/`prior` (see below) but its `beta_linkage` slot is held at `0` and not estimated. The recruitment / mortality / growth modules read `base + offset(yr)` each year -- with the intercept fixed at zero, the offset is exactly the slope contribution. **Slope-only formulas (`~ 0 + temp`)** -- there is no `(Intercept)` row, so the base parameter is mapped out (NA) and stays at the default initial value, or the value supplied by the `inits` argument in `fit_mod`, or the value supplied by `init = list("(Intercept)"= ...)` on the spec. ### `init` and the base parameter `init = list(`(Intercept)` = X)` on the linkage spec **flows to the base parameter** rather than to the linkage row. The linkage row's `beta_linkage` slot stays at `0`; the base parameter starts at `X`. This is the recommended way to set a sensible starting value for intercept-bearing linkages -- it composes naturally with phasing (the base parameter goes through the existing per-process phase logic) and keeps the optimiser grounded. ```{r} # Start M1 at 0.06 without estimating an extra intercept beta. build_M1( M1_model = "sex_age_invariant", linkages = list( M1 = linkage_spec( formula = ~ 1, init = list(`(Intercept)` = 0.06) ) ) ) ``` ### Stratification Stratification (`by`) determines *which* base-parameter slots are affected: `by = ~ species` produces one `(Intercept)` row per species; `by = NULL` (a fully shared linkage) produces a single shared row; and `by = ~ species + sex` produces one per (species, sex). Each `(Intercept)` row that the user supplies an `init` for pushes that init into the matching base-parameter slot. This rule applies uniformly to every linkage target (`K`, `L1`, `Linf`, `m`, `M1`, `R0`, `alpha`, `beta`). ### Single-sex models and `by = ~ ... + sex` If your model is single-sex (`nsex = 1` for every species in scope), asking for `by = ~ species + sex` produces a warning: there is only one sex level available, so the sex stratum collapses to a single shared level. The result is equivalent to `by = ~ species`. Models with mixed `nsex` (some single-sex, some two-sex species) emit one sex level per species automatically -- single-sex species get one row per parameter, two-sex species get two. ### Pooling `pool_linkages()` materialises each spec independently and concatenates the rows into a single table. The shared design matrix unions all RHS terms across specs by name -- there's no duplication when two specs use the same predictor, and per-species formulas can pull in extra columns (e.g. species 1 uses `poly(temp, 3)*` while species 2 uses only linear `temp`). ## On the roadmap * **Random-effects grouping** -- the `re_group` column on the linkage table is stored end-to-end; the cpp will grow a hierarchical penalty so coefficients sharing a `re_group` value are partially pooled. * **Age-dependent linkages on processes other than M** -- the `age_bin` stratum is consumed by the M accumulator today (a row's `age_bin == NA` broadcasts across ages; specific values pin the offset to that age slice). The schema is reserved for other processes that may eventually vary by age. * **Selectivity / catchability linkages** -- the encoder already reserves `process = "sel"` and `process = "q"` codes, but no cpp accumulator consumes them yet. * **Legacy retirement (v4.4.0)** -- `srr_indices`, `M1_indices`, `srr_fun in c(1, 3, 5)`, and `M1_model in c(4, 5)` retire in the next minor release; see *Scheduled removal (v4.4.0)* in `NEWS.md` for the migration table. ## See also * `?linkage_spec`, `?build_growth`, `?build_M1`, `?build_srr`, `?prior_normal` * `vignette("model-parameterizations", package = "Rceattle")` for the full set of process options.