3. Model diagnostics

Rceattle provides several layers of diagnostics: convergence diagnostics attached automatically to every fit, S3 methods (residual, logLik, etc), fit plots to visually inspect how well the model reproduces observed data, retrospective analysis (peels) to detect patterns of systematic over- or under-estimation, jitter testing to check that the optimiser has found a global minimum, self-testing (simulation–estimation) to check that the model can recover its own estimates from simulated data, and likelihood profiling to check how informative the data are about a particular parameter.

Setup and plotting data

library(Rceattle)

# Fit a single-species model for the 2022 Northern Rockfish assessment.
data("NorthernRockfish2022")
nrdata <- NorthernRockfish2022
nrdata$fleet_control$proj_F_prop <- 1
plot_data(nrdata)

Fitting the model

model_1 <- Rceattle::fit_mod(
  data_list = nrdata,
  estimateMode = 0,  # Estimate
  initMode = 1,      # Assume unfished equilibrium
  M1Fun = build_M1(
    updateM1 = TRUE,
    M1_model = 1,
    M1_use_prior = TRUE,
    M_prior = 0.06,
    M_prior_sd = 0.05),
  fit_control = fit_control(phase = TRUE, verbose = 0)
)
summary(model_1)

S3 Methods

summary(model_1)              # same compact summary
coef(model_1)                 # estimated fixed-effect parameters
logLik(model_1)               # logLik with df attribute (AIC works)
AIC(model_1)
vcov(model_1)                 # fixed-effect covariance from sdreport
residuals(model_1)                    # response residuals, all data sources
residuals(model_1, type = "pearson")  # Pearson residuals (glm-style `type`)
as.data.frame(model_1)

Convergence diagnostics

fit_mod() runs a battery of convergence checks after optimization and attaches the result as model$convergence. Each check is one record with a severity ("OK", "NOTE", "WARN", or "FAIL"); the object’s status is the worst severity present. Non-OK checks are surfaced via message() during the fit (a non-converged model is never turned into an error — the fit is always returned with its diagnostics attached), and print(model) shows the overall status.

model_1$convergence            # overall status + any non-OK checks
print(model_1$convergence, all = TRUE)   # show every check, including OK ones

Re-run the battery on any fit with convergence_diagnostics():

convergence_diagnostics(model_1)

The checks cover:

max_gradient — maximum absolute marginal gradient and the parameter carrying it (WARN > 1e-3, FAIL > 1).
pdHess — whether the Hessian is positive definite (FAIL if not).
sdreport_failed — FAIL when an sdreport was requested but the Hessian could not be inverted.
hessian_conditioning — the Hessian condition number and, when poorly conditioned, the parameters loading on the least-identified direction (the unidentified linear combination). Complements TMBhelper::check_estimability (a per-parameter verdict) with a continuous severity and direction.
parameters_on_bounds — parameters that hit a configured build_bounds() limit (often unidentified or mis-scaled).
phasing — phases that ended with a high gradient, localizing which parameter block is hard to fit.
estimability — surfaces TMBhelper::check_estimability when it ran.

Fit plots

Composition data

plot_comp() draws composition Pearson-residual bubbles and overlays the observed (shaded area) and predicted (line) age or length compositions for every fleet, both annually and aggregated across years. Joint-sex data are drawn on a single bin axis with females above and males below zero.

plot_comp(model_1)

Survey indices

plot_index() and plot_logindex() show observed vs. predicted survey biomass on the natural and log scales, respectively. plot_indexresidual() plots the log-scale residuals — a useful first check for time trends or heteroscedasticity.

plot_index(model_1)
plot_logindex(model_1)
plot_indexresidual(model_1)

Catch

plot_catch(model_1)

One-step-ahead (OSA) residuals

Pearson residuals on composition data are not standard normal even when the model is correct, because the age/length bins are correlated (more fish in one bin means fewer in another) and because random effects induce correlation across years. One-step-ahead (OSA) residuals fix this: each observation is residualized conditional on the previously-added observations, integrating out the random effects, so the residuals are iid standard normal under a correctly specified model (Thygesen et al. 2017; Trijoulet et al. 2023). They therefore support objective goodness-of-fit testing.

osa_residuals() computes them post hoc for any combination of the fitted data types – survey indices, fishery catch, age/length composition ("comp"), conditional age-at-length ("caal"), and predator diet ("diet"). Composition data are residualized with the conditional binomial / beta-binomial decomposition. OSA re-optimizes the random effects per observation, so it is expensive; run it on a converged fit (estimateMode < 3).

All OSA sources – aggregate index/catch and composition ("comp"/"caal"/"diet") – are available from any fit; osa_residuals() builds the required observation data on demand.

# Aggregate (catch + index) residuals
osa <- osa_residuals(model_1, source = c("index", "catch"))
head(osa)

# Statistical diagnostics (Stewart & Monnahan 2025): SDNR and lower/upper tail
# statistics, each with the interval expected under the standard-normal null.
osa_diagnostics(osa)

# Q-Q plot (with SDNR / tail annotation) + residual-by-year
plot(osa)

# residuals() also exposes them in the common residual schema (`source` selects
# the data, like `type` in stats::residuals.glm()):
residuals(model_1, type = "osa", source = c("index", "catch"))

# Composition OSA residuals -- no special fit required
plot_comp(model_1, residual_type = "osa")              # Q-Q + signed bubbles
residuals(model_1, type = "osa", source = "comp")

Following Stewart and Monnahan (2025), a practical workflow is:

Inspect the aggregate fit across years (plot_comp(model_1)): systematic lack of fit here points to structural misspecification (e.g. the wrong selectivity shape).
Compare the SDNR to its null interval (from osa_diagnostics()): a value outside the interval flags over- or under-dispersion, often a data-weighting or effective-sample-size issue.
Compare the lower/upper tail statistics to their null intervals: these localize departures to the tails of the residual distribution.
Inspect Pearson bubble plots (plot_comp(model_1), the default): patterns in the sign of residuals across age/year can identify where lack of fit occurs – Stewart and Monnahan recommend keeping these alongside OSA residuals.
Revise the model or data weighting where misspecification is found, using this together with the other diagnostics rather than as an accept/reject test.

Because OSA residuals for discrete composition data are randomized quantile residuals, they are stochastic; osa_residuals() takes a seed for reproducibility.

Process residuals

process_residuals() is the complementary check on the process model: it standardizes the model’s random-effect deviations (recruitment, initial abundance, catchability) against their process prior, drawing once from the joint posterior (SAM’s procres; Nielsen and Berg 2014). Under a correct process model these are also approximately iid standard normal.

pr <- process_residuals(model_1, process = "recruitment")
osa_diagnostics(pr)
plot(pr)

# residuals() exposes them too:
residuals(model_1, type = "process")

Retrospective analysis

A retrospective analysis systematically removes the most recent years of data one peel at a time and re-fits the model. Bias in the final estimates relative to earlier peels is summarised by Mohn’s rho: values outside ±0.2 (for SSB) generally indicate a problem worth investigating.

model_1_retro <- retrospective(Rceattle = model_1, peels = 5)

# Mohn's rho for each quantity
model_1_retro$mohns

# Plot historical trajectories across peels
plot_biomass(model_1_retro$Rceattle_list)

# Include the projection period to see how the forecast changes
plot_biomass(model_1_retro$Rceattle_list, incl_proj = TRUE)

The retrospective() function returns a list with two elements:

Element	Description
`Rceattle_list`	List of fitted models, ordered from full run to most-peeled
`mohns`	Data frame with columns `Object` (quantity, e.g. `"Biomass"`, `"SSB"`), `Forecast year` (0 = terminal year bias; 1+ = forecast skill), `N` (number of peels), and one column per species with mean relative error (Mohn’s rho)

The nyrs_forecast argument (default 3) additionally evaluates rho for years projected beyond the terminal peel, making it possible to quantify forecast skill alongside retrospective bias in a single call.

Jitter testing

Jitter testing re-fits the model from many randomly perturbed starting values to check whether the optimizer consistently returns the same minimum negative log-likelihood (NLL). A spread of NLL values across jitters suggests the likelihood surface has multiple local minima and results should be interpreted cautiously.

jitters <- jitter(Rceattle = model_1, njitter = 10, phase = TRUE)  # default njitter = 50

# Histogram of NLL differences relative to the best run
hist(log(jitters$nll - min(jitters$nll)),
     main = "Jitter NLL spread (log scale)",
     xlab = "log(NLL - min NLL)")

# Overlay biomass trajectories — tight overlap indicates a stable optimum
plot_biomass(jitters$Rceattle_list)

# Number of runs that converged
length(jitters$Rceattle_list)

Non-converging runs are silently dropped from Rceattle_list, so if length(jitters$Rceattle_list) is much less than njitter, convergence may be fragile.

retrospective() and jitter() both fit peels / starts in parallel by default. Pass cores = 1 to force sequential execution; otherwise the call uses a PSOCK cluster sized at parallel::detectCores() - 6 (capped at 2 under R CMD check).

Self-test (simulation–estimation)

self_test() simulates nsim datasets from a fitted model — keeping its estimated parameters fixed — and re-fits the model to each simulated dataset. If estimates from the refits cluster around the parameter values of the fit they were simulated from, the model is at least self-consistent (it can recover its own estimates from data it generated). Persistent bias or wide spread in a quantity is a sign that the data are not informative about it.

sims <- self_test(model_1, nsim = 10)            # default nsim = 50

# Number of simulations that converged (non-converged runs are dropped)
length(sims)

# Overlay biomass / SSB trajectories across simulations — the original fit's
# trajectory should sit inside the spread of the refits.
plot_biomass(c(list(model_1), sims), model_names = c("fit", names(sims)))
plot_ssb(c(list(model_1), sims),     model_names = c("fit", names(sims)))

Each simulation uses seed + i as its RNG seed, so results are reproducible whether or not cores > 1. Set simulate = FALSE to refit against the model’s expected values (no observation error) rather than draws from the observation likelihood — useful for confirming that the estimator returns the generating parameters in the noise-free limit. cores behaves the same way as in retrospective() and jitter().

Likelihood profile

profile() re-fits the model across a grid of fixed values for a chosen parameter and returns the resulting NLL surface. A flat profile means the data carry little information about the parameter; a sharp minimum away from the MLE means the fit has not actually settled there. It supports the recruitment standard deviation, the stock–recruit parameters, and natural mortality.

Specify the parameter with a natural-scale alias: "sigmaR" (recruitment standard deviation), "M1" (natural mortality), or the stock–recruit parameters "R0", "alpha", and "beta". Aliases take natural-scale values directly. slots gives the cell(s) to profile — usually just the species index — and values gives the grid of values for each. Supplying more than one slot profiles over the full grid of combinations.

# 1-D profile: sigmaR for species 1 (natural-scale alias)
prof_sigmaR <- profile(
  fitted = model_1,
  param    = "sigmaR",
  slots    = list(1),
  values   = list(seq(0.1, 1.5, by = 0.1))
)

plot(prof_sigmaR$grid$slot_1,
     prof_sigmaR$nll - min(prof_sigmaR$nll, na.rm = TRUE),
     type = "l", xlab = "sigmaR", ylab = "dNLL")

# 1-D profile: SRR alpha for species 1
# (alias fills in the rec_pars column; slot is just the species index)
prof_alpha <- profile(
  fitted = model_1,
  param    = "alpha",
  slots    = list(1),
  values   = list(seq(2, 80, length.out = 20))
)

# 2-D cross-profile: M1 across sex for species 1, age 1
prof_M_sex <- profile(
  fitted = model_1,
  param    = "M1",
  slots    = list(c(1, 1, 1), c(1, 2, 1)),  # males and females
  values   = list(seq(0.10, 0.40, by = 0.05),
                  seq(0.10, 0.40, by = 0.05))
)

profile() returns Rceattle_list (one fit per grid row, NULL where the fit failed), grid (the user-scale value grid), nll (joint NLL aligned with grid, NA for non-converged fits), and echoes of param and slots. To cross-profile across multiple species, supply one slot per species (e.g. slots = list(1, 2, 3) with param = "sigmaR"); to cross-profile M1 across sex, supply one slot per sex as in the third example above. cores behaves the same way as in the other diagnostic functions.

Comparing single- and multi-species trajectories

Plotting sensitivity runs together is itself a useful diagnostic — large divergences in biomass or mortality deserve scrutiny.

mod_list  <- list(model_1, model_1_retro$Rceattle_list$Year_2019)
mod_names <- 1:length(mod_list)

plot_biomass(Rceattle = mod_list, model_names = mod_names)
plot_recruitment(Rceattle = mod_list, model_names = mod_names, add_ci = TRUE)
plot_depletionSSB(Rceattle = mod_list, model_names = mod_names)

Model average

For model averaging across model variants, see ?model_average.

mod_avg <- model_average(Rceattle = list(model_1, model_1), weights = c(1,1))
plot_biomass(mod_avg)