Rceattle provides several layers
of diagnostics: convergence diagnostics attached
automatically to every fit, S3 methods
(residual, logLik, etc), fit
plots to visually inspect how well the model reproduces
observed data, retrospective analysis (peels) to detect
patterns of systematic over- or under-estimation, jitter
testing to check that the optimiser has found a global minimum,
self-testing (simulation–estimation) to check that the
model can recover its own estimates from simulated data, and
likelihood profiling to check how informative the data
are about a particular parameter.
library(Rceattle)
# Fit a single-species model for the 2022 Northern Rockfish assessment.
data("NorthernRockfish2022")
nrdata <- NorthernRockfish2022
nrdata$fleet_control$proj_F_prop <- 1
plot_data(nrdata)Fitting the model
model_1 <- Rceattle::fit_mod(
data_list = nrdata,
estimateMode = 0, # Estimate
initMode = 1, # Assume unfished equilibrium
M1Fun = build_M1(
updateM1 = TRUE,
M1_model = 1,
M1_use_prior = TRUE,
M_prior = 0.06,
M_prior_sd = 0.05),
fit_control = fit_control(phase = TRUE, verbose = 0)
)
summary(model_1)fit_mod() runs a battery of convergence checks after
optimization and attaches the result as model$convergence.
Each check is one record with a severity
("OK", "NOTE", "WARN", or
"FAIL"); the object’s status is the worst
severity present. Non-OK checks are surfaced via message()
during the fit (a non-converged model is never turned into an error —
the fit is always returned with its diagnostics attached), and
print(model) shows the overall status.
model_1$convergence # overall status + any non-OK checks
print(model_1$convergence, all = TRUE) # show every check, including OK onesRe-run the battery on any fit with
convergence_diagnostics():
The checks cover:
max_gradient — maximum absolute
marginal gradient and the parameter carrying it (WARN >
1e-3, FAIL > 1).pdHess — whether the Hessian is
positive definite (FAIL if not).sdreport_failed — FAIL
when an sdreport was requested but the Hessian could not be
inverted.hessian_conditioning — the Hessian
condition number and, when poorly conditioned, the parameters loading on
the least-identified direction (the unidentified linear combination).
Complements TMBhelper::check_estimability (a per-parameter
verdict) with a continuous severity and direction.parameters_on_bounds — parameters that
hit a configured build_bounds() limit (often unidentified
or mis-scaled).phasing — phases that ended with a
high gradient, localizing which parameter block is hard to fit.estimability — surfaces
TMBhelper::check_estimability when it ran.Each record carries a data element with the underlying
numbers (e.g. the condition number and loadings for
hessian_conditioning).
plot_comp() overlays observed (bars) and predicted
(line) age or length compositions for every fleet.
plot_index() and plot_logindex() show
observed vs. predicted survey biomass on the natural and log scales,
respectively. plot_indexresidual() plots the log-scale
residuals — a useful first check for time trends or
heteroscedasticity.
A retrospective analysis systematically removes the most recent years of data one peel at a time and re-fits the model. Bias in the final estimates relative to earlier peels is summarised by Mohn’s rho: values outside ±0.2 (for SSB) generally indicate a problem worth investigating.
model_1_retro <- retrospective(Rceattle = model_1, peels = 5)
# Mohn's rho for each quantity
model_1_retro$mohns
# Plot historical trajectories across peels
plot_biomass(model_1_retro$Rceattle_list)
# Include the projection period to see how the forecast changes
plot_biomass(model_1_retro$Rceattle_list, incl_proj = TRUE)The retrospective() function returns a list with two
elements:
| Element | Description |
|---|---|
Rceattle_list |
List of fitted models, ordered from full run to most-peeled |
mohns |
Data frame with columns Object (quantity,
e.g. "Biomass", "SSB"),
Forecast year (0 = terminal year bias; 1+ = forecast
skill), N (number of peels), and one column per species
with mean relative error (Mohn’s rho) |
The nyrs_forecast argument (default 3) additionally
evaluates rho for years projected beyond the terminal peel, making it
possible to quantify forecast skill alongside retrospective bias in a
single call.
Jitter testing re-fits the model from many randomly perturbed starting values to check whether the optimizer consistently returns the same minimum negative log-likelihood (NLL). A spread of NLL values across jitters suggests the likelihood surface has multiple local minima and results should be interpreted cautiously.
jitters <- jitter(Rceattle = model_1, njitter = 10, phase = TRUE) # default njitter = 50
# Histogram of NLL differences relative to the best run
hist(log(jitters$nll - min(jitters$nll)),
main = "Jitter NLL spread (log scale)",
xlab = "log(NLL - min NLL)")
# Overlay biomass trajectories — tight overlap indicates a stable optimum
plot_biomass(jitters$Rceattle_list)
# Number of runs that converged
length(jitters$Rceattle_list)Non-converging runs are silently dropped from
Rceattle_list, so if
length(jitters$Rceattle_list) is much less than
njitter, convergence may be fragile.
retrospective() and jitter() both fit peels
/ starts in parallel by default. Pass cores = 1 to force
sequential execution; otherwise the call uses a PSOCK cluster sized at
parallel::detectCores() - 6 (capped at 2 under
R CMD check).
self_test() simulates nsim datasets from a
fitted model — keeping its estimated parameters fixed — and re-fits the
model to each simulated dataset. If estimates from the refits cluster
around the parameter values of the fit they were simulated from, the
model is at least self-consistent (it can recover its own estimates from
data it generated). Persistent bias or wide spread in a quantity is a
sign that the data are not informative about it.
sims <- self_test(model_1, nsim = 10) # default nsim = 50
# Number of simulations that converged (non-converged runs are dropped)
length(sims)
# Overlay biomass / SSB trajectories across simulations — the original fit's
# trajectory should sit inside the spread of the refits.
plot_biomass(c(list(model_1), sims), model_names = c("fit", names(sims)))
plot_ssb(c(list(model_1), sims), model_names = c("fit", names(sims)))Each simulation uses seed + i as its RNG seed, so
results are reproducible whether or not cores > 1. Set
simulate = FALSE to refit against the model’s
expected values (no observation error) rather than draws from
the observation likelihood — useful for confirming that the estimator
returns the generating parameters in the noise-free limit.
cores behaves the same way as in
retrospective() and jitter().
profile() re-fits the model across a grid of fixed
values for one or more cells of a chosen parameter and returns the
resulting NLL surface. A flat profile means the data carry little
information about the parameter; a sharp minimum away from the MLE means
the fit has not actually settled there. The function generalises the
legacy profile_rsigma() helper from
examples/Profiling_sigmaR.R and works for any parameter
slot in Rceattle$estimated_params; it is tested for
R_log_sd (recruitment sigma), rec_pars
(stock-recruit parameters), and log_M1 (residual natural
mortality).
Two ways to specify param:
transform = "log"
so you pass natural-scale values like sigmaR or M directly. For the
rec_pars aliases ("R0", "alpha",
"beta") the column is filled in from the alias name, so
slots only needs the species index.
"sigmaR" / "R_sd" →
R_log_sd"M1" → log_M1"R0" → rec_pars[, 1]"alpha" → rec_pars[, 2]"beta" → rec_pars[, 3]"R_log_sd", "rec_pars",
"log_M1", etc.); slots must index the full
array and transform controls the scale.slots is a list of integer index vectors — one entry per
cell to hold fixed — and (under raw names) its entries must match the
dimensionality of the parameter array (length 1 for vectors, length 2
for matrices, length 3 for 3-D arrays). values is a list of
value grids parallel to slots; the full grid of fits is
expand.grid(values), so multiple slots produce an N-D
cross-profile.
# 1-D profile: sigmaR for species 1 (natural-scale alias)
prof_sigmaR <- profile(
fitted = model_1,
param = "sigmaR",
slots = list(1),
values = list(seq(0.1, 1.5, by = 0.1))
)
plot(prof_sigmaR$grid$slot_1,
prof_sigmaR$nll - min(prof_sigmaR$nll, na.rm = TRUE),
type = "l", xlab = "sigmaR", ylab = "dNLL")
# 1-D profile: SRR alpha for species 1
# (alias fills in the rec_pars column; slot is just the species index)
prof_alpha <- profile(
fitted = model_1,
param = "alpha",
slots = list(1),
values = list(seq(2, 80, length.out = 20))
)
# 2-D cross-profile: M1 across sex for species 1, age 1
prof_M_sex <- profile(
fitted = model_1,
param = "M1",
slots = list(c(1, 1, 1), c(1, 2, 1)), # males and females
values = list(seq(0.10, 0.40, by = 0.05),
seq(0.10, 0.40, by = 0.05))
)profile() returns Rceattle_list (one fit
per grid row, NULL where the fit failed), grid
(the user-scale value grid), nll (joint NLL aligned with
grid, NA for non-converged fits), and echoes
of param and slots. To cross-profile across
multiple species, supply one slot per species
(e.g. slots = list(1, 2, 3) with
param = "sigmaR"); to cross-profile M1 across sex, supply
one slot per sex as in the third example above. cores
behaves the same way as in the other diagnostic functions.
Plotting sensitivity runs together is itself a useful diagnostic — large divergences in biomass or mortality deserve scrutiny.