Criminal Sample Predictors via LASSO

Identifying key drivers of violent intent in the jihadi detainee data

Why a LASSO analysis?

The original random forest analysis supplied broad signals about violent intent across the wider sample. However, the criminal extremist subset is small (≈ 80 cases) and potentially idiosyncratic. LASSO regression is well-suited here because it performs feature selection and shrinkage simultaneously, guarding against overfitting in high-dimensional, low-(N) settings. The goal is to surface a compact, interpretable set of predictors that remain active when the model is tuned with cross-validation.

How does LASSO regression work?

LASSO regression (Least Absolute Shrinkage and Selection Operator) is a variant of linear regression that combines prediction with variable selection. Instead of only fitting the best line to the data, it adds a penalty that discourages including too many predictors. This penalty is controlled by a parameter, usually called λ (lambda).

Ordinary regression: estimates coefficients by minimizing the difference between predicted and observed values.
LASSO regression: does the same, but adds a penalty proportional to the absolute size of the coefficients. The optimization problem looks like this: \[\text{Loss} = \text{Error} + \lambda \sum |\beta_j|\] where \(\beta_j\) are the regression coefficients.

The effect of this penalty is that:

For small λ, the penalty is weak: most predictors are kept, similar to standard regression.
For large λ, the penalty is strong: many coefficients shrink exactly to zero, effectively removing those predictors from the model.

This means that lasso not only estimates relationships, but also performs automatic variable selection. Predictors with no meaningful contribution are excluded, leaving a simpler and more interpretable model.

How λ is chosen

Because the number of variables in the final model depends on λ, we cannot set it arbitrarily. Instead, we use cross-validation:

Split the dataset into folds.
Fit the model across a grid of possible λ values.
Evaluate prediction accuracy (e.g., using root mean square error).
Select the λ that gives the best performance on unseen folds.

The final model is then re-fit using this optimal λ. At that penalty level, only the predictors with non-zero coefficients remain active.

Why this matters for our analysis

Our dataset is small but has many potential predictors. Using all of them risks overfitting.
Lasso automatically identifies which variables contribute most reliably to predicting violent intent.
The result is a compact set of predictors that balances parsimony (few variables) with accuracy (good predictions).

Data preparation

Code

load("../data/wrangled_data/dt_ana_base_jih.RData")

criminal_raw <- dt_ana_base_jih %>%
  clean_names() %>%
  mutate(
    past_activism = ifelse(is.nan(past_activism), NA_real_, past_activism),
    across(where(is.factor), as.character)
  )

criminal_df <- criminal_raw %>%
  select(
    -response_id,
    -gender, # all men
    -education,
    -religion, # all muslim
    -country,
    -peaceful_intent, # alternative DV
    -radical_attitudes, # too close to main DV
    -activist_intent,  # fully missing in this sample
    -intuition,  # fully missing in this sample
    -logic  # fully missing in this sample
  ) %>%
  mutate(across(everything(), as.numeric)) %>%
  select(where(~ !all(is.na(.)))) %>%
  drop_na(violent_intent)

n_cases <- nrow(criminal_df)
n_predictors <- ncol(criminal_df) - 1
list(
  observations = n_cases,
  predictors = n_predictors
)

$observations
[1] 80

$predictors
[1] 14

The cleaned data contain 80 observations with 14 candidate predictors for violent intent. Despite the small sample, there is still a rich set of attitudinal and emotional measures to explore.

Missingness overview

Code

missing_summary <- criminal_df %>%
  summarise(across(everything(), ~ sum(is.na(.)))) %>%
  pivot_longer(cols = everything(), names_to = "variable", values_to = "missing_n") %>%
  mutate(
    missing_pct = missing_n / n_cases
  ) %>%
  arrange(desc(missing_pct))

missing_summary %>%
  mutate(missing_pct = percent(missing_pct, accuracy = 0.1)) %>%
  gt() %>%
  fmt_number(columns = missing_n, decimals = 0) %>%
  tab_header(title = "Missing data pattern in the criminal sample")

Missing data pattern in the criminal sample
variable	missing_n	missing_pct
past_activism	11	13.8%
collective_relative_deprivation	10	12.5%
education_num	6	7.5%
identity_fusion	5	6.2%
age	2	2.5%
moral_neutralization	0	0.0%
anger	0	0.0%
positive_affect	0	0.0%
negative_affect	0	0.0%
perceived_discrimination	0	0.0%
ingroup_superiority	0	0.0%
harmonious_passion	0	0.0%
obsessive_passion	0	0.0%
commitment_passion	0	0.0%
violent_intent	0	0.0%

Variables such as collective relative deprivation and identity fusion still exhibit gaps, but median imputation (embedded in the modeling recipe below) keeps them in play without discarding cases.

Distributions and descriptive statistics

Code

stat_df <- criminal_df %>%
  mutate(across(everything(), ~ ifelse(is.na(.), median(., na.rm = TRUE), .)))

p_vi <- ggplot(stat_df, aes(x = violent_intent)) +
  geom_histogram(bins = 12, fill = "#2c7fb8", colour = "white") +
  geom_density(aes(y = after_stat(count)), colour = "#08306b", linewidth = 0.9, alpha = 0.4) +
  labs(x = "Violent intent", y = "Count", title = "Distribution of violent intent") +
  theme_minimal()

p_vi

Violent intent is markedly skewed to the floor with a long tail at the high end, underscoring why regularization is critical: only a handful of detainees report elevated willingness to use violence.

Code

basic_stats <- stat_df %>%
  summarise(across(everything(), list(
    mean = ~ mean(.x),
    sd = ~ sd(.x),
    min = ~ min(.x),
    max = ~ max(.x)
  ), .names = "{.col}__{.fn}")) %>%
  pivot_longer(everything(), names_to = c("variable", "moment"), values_to = "value", names_sep = "__") %>%
  pivot_wider(names_from = moment, values_from = value) %>%
  arrange(desc(mean))

basic_stats %>%
  mutate(across(mean:max, ~ round(.x, 2))) %>%
  gt() %>%
  tab_header(title = "Moment statistics (median-imputed)")

Moment statistics (median-imputed)
variable	mean	sd	min	max
age	36.36	10.57	20	63.00
commitment_passion	4.41	2.16	1	7.00
positive_affect	3.54	0.91	1	5.00
anger	3.52	1.43	1	6.00
harmonious_passion	3.48	2.02	1	7.00
identity_fusion	3.25	2.92	1	7.00
perceived_discrimination	3.18	2.13	1	7.00
collective_relative_deprivation	2.76	2.14	1	7.00
negative_affect	2.46	1.01	1	5.00
ingroup_superiority	2.46	2.09	1	7.00
obsessive_passion	1.84	1.47	1	7.00
education_num	1.83	1.03	1	5.00
moral_neutralization	1.59	0.97	1	4.88
violent_intent	1.48	1.60	1	8.00
past_activism	0.09	0.24	0	1.00

Zero-order signal with violent intent

Code

cor_summary <- stat_df %>%
  relocate(violent_intent) %>%
  summarise(across(-violent_intent, ~ cor(.x, violent_intent))) %>%
  pivot_longer(everything(), names_to = "variable", values_to = "correlation") %>%
  mutate(variable = fct_reorder(variable, correlation))

cor_summary %>%
  ggplot(aes(x = correlation, y = variable, fill = correlation > 0)) +
  geom_col(width = 0.6) +
  scale_fill_manual(values = c("TRUE" = "#ef6c00", "FALSE" = "#1565c0"), guide = "none") +
  labs(x = "Pearson correlation", y = NULL, title = "Bivariate association with violent intent") +
  theme_minimal()

Activist passion measures, perceived discrimination, and ingroup superiority already show positive simple correlations with violent intent, but multicollinearity and noise require a penalised model to adjudicate which survive joint estimation.

Modeling strategy

We adopt a tidy modeling workflow with five-fold cross-validation repeated 20 times (100 resamples total) to stabilise the penalty search. Predictors are median-imputed and standardized within each resample so that the LASSO penalty treats them on a comparable scale.

Code

set.seed(20241105)

lasso_recipe <- recipe(violent_intent ~ ., data = criminal_df) %>%
  step_impute_median(all_predictors()) %>%
  step_normalize(all_predictors())

lasso_spec <- linear_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet")

lasso_workflow <- workflow() %>%
  add_model(lasso_spec) %>%
  add_recipe(lasso_recipe)

folds <- vfold_cv(criminal_df, v = 5, repeats = 20)
penalty_grid <- grid_regular(penalty(range = c(-6, 1)), levels = 50)

lasso_res <- tune_grid(
  lasso_workflow,
  resamples = folds,
  grid = penalty_grid,
  metrics = metric_set(rmse, rsq)
)

cv_metrics <- collect_metrics(lasso_res)
cv_metrics %>%
  filter(.metric == "rmse") %>%
  arrange(mean) %>%
  slice_head(n = 5)

# A tibble: 5 × 7
  penalty .metric .estimator  mean     n std_err .config         
    <dbl> <chr>   <chr>      <dbl> <int>   <dbl> <chr>           
1  0.0518 rmse    standard    1.18   100  0.0353 pre0_mod34_post0
2  0.0373 rmse    standard    1.18   100  0.0340 pre0_mod33_post0
3  0.0720 rmse    standard    1.18   100  0.0369 pre0_mod35_post0
4  0.0268 rmse    standard    1.19   100  0.0330 pre0_mod32_post0
5  0.1    rmse    standard    1.19   100  0.0386 pre0_mod36_post0

The table displays the five penalties with the lowest cross-validated root mean square error (RMSE).

Code

best_rmse <- select_best(lasso_res, metric = "rmse")
rmse_metrics <- cv_metrics %>% filter(.metric == "rmse")

rmse_metrics %>%
  ggplot(aes(x = penalty, y = mean)) +
  geom_line(colour = "#37474f") +
  geom_point(colour = "#455a64") +
  geom_point(
    data = rmse_metrics %>% filter(penalty == best_rmse$penalty),
    inherit.aes = FALSE,
    aes(x = penalty, y = mean),
    colour = "#d32f2f",
    size = 3
  ) +
  geom_ribbon(aes(ymin = mean - std_err, ymax = mean + std_err), alpha = 0.15, fill = "#607d8b") +
  scale_x_log10(labels = label_number(accuracy = 0.001)) +
  labs(
    x = "LASSO penalty (log scale)",
    y = "Cross-validated RMSE",
    title = "Penalty selection profile"
  ) +
  theme_minimal()

The optimal penalty is modest, indicating a model that retains a handful of predictors while still shrinking minor contributors toward zero.

Final model and coefficient profile

Code

final_workflow <- finalize_workflow(lasso_workflow, best_rmse)
final_fit <- final_workflow %>% fit(data = criminal_df)

nonzero_coefs <- final_fit %>%
  extract_fit_parsnip() %>%
  tidy(return_zeros = TRUE) %>%
  filter(term != "(Intercept)", abs(estimate) > 0) %>%
  mutate(
    term_raw = term,
    term_label = str_replace_all(term_raw, "_", " "),
    abs_estimate = abs(estimate),
    direction = if_else(estimate >= 0, "Risk factor", "Protective factor")
  ) %>%
  arrange(desc(abs_estimate))

nonzero_coefs %>%
  mutate(rank = row_number()) %>%
  select(rank, term = term_label, estimate, direction) %>%
  mutate(estimate = round(estimate, 3)) %>%
  gt() %>%
  tab_header(title = "Active coefficients in the tuned LASSO model")

Active coefficients in the tuned LASSO model
rank	term	estimate	direction
1	obsessive passion	0.590	Risk factor
2	moral neutralization	0.580	Risk factor
3	past activism	-0.270	Protective factor
4	ingroup superiority	0.269	Risk factor
5	identity fusion	0.255	Risk factor
6	education num	-0.220	Protective factor
7	commitment passion	-0.103	Protective factor
8	anger	-0.086	Protective factor
9	negative affect	-0.068	Protective factor
10	age	0.063	Risk factor
11	collective relative deprivation	-0.029	Protective factor
12	perceived discrimination	0.017	Risk factor

Code

nonzero_coefs %>%
  mutate(term_label = fct_reorder(term_label, estimate)) %>%
  ggplot(aes(x = estimate, y = term_label, fill = direction)) +
  geom_col(width = 0.65) +
  scale_fill_manual(values = c("Risk factor" = "#c62828", "Protective factor" = "#2e7d32")) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "#424242") +
  labs(
    x = "Standardised coefficient",
    y = NULL,
    title = "Relative weight of retained predictors"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

The coefficients represent the effect of a one standard deviation change in each predictor after median imputation. Positive estimates correspond to elevated violent intent, whereas negative values indicate attenuation.

Selection stability

Code

set.seed(20241105)
stability_boot_n <- getOption("lasso_stability_boot", 500)

stability_samples <- bootstraps(criminal_df, times = stability_boot_n)

stability_results <- stability_samples %>%
  mutate(
    estimates = map(
      splits,
      ~ final_workflow %>%
        fit(data = analysis(.x)) %>%
        extract_fit_parsnip() %>%
        tidy(return_zeros = TRUE) %>%
        filter(term != "(Intercept)") %>%
        transmute(
          term_raw = term,
          estimate = estimate,
          selected = abs(estimate) > 1e-6
        )
    )
  ) %>%
  select(estimates) %>%
  unnest(estimates) %>%
  group_by(term_raw) %>%
  summarise(
    inclusion_prob = mean(selected),
    avg_coef = mean(estimate),
    sd_coef = sd(estimate),
    .groups = "drop"
  ) %>%
  mutate(
    term_label = str_replace_all(term_raw, "_", " ")
  ) %>%
  arrange(desc(inclusion_prob))

stability_cutoff <- 0.6

selected_terms <- stability_results %>%
  filter(inclusion_prob >= stability_cutoff) %>%
  pull(term_raw)

if (length(selected_terms) == 0) {
  selected_terms <- nonzero_coefs$term_raw
}

selection_sentence <- stability_results %>%
  filter(term_raw %in% selected_terms) %>%
  mutate(label = str_c(term_label, " (", percent(inclusion_prob, accuracy = 1), ")")) %>%
  arrange(desc(inclusion_prob)) %>%
  pull(label) %>%
  str_c(collapse = ", ")

if (length(selection_sentence) == 0) {
  selection_sentence <- ""
}

selection_summary <- stability_results

Code

stability_results %>%
  mutate(term_label = fct_reorder(term_label, inclusion_prob)) %>%
  ggplot(aes(x = inclusion_prob, y = term_label)) +
  geom_col(fill = "#4a90e2") +
  geom_vline(xintercept = stability_cutoff, linetype = "dashed", colour = "#b71c1c") +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  labs(
    x = "Bootstrap inclusion probability",
    y = NULL,
    title = "Selection stability across 500 bootstrap replicates"
  ) +
  theme_minimal()

Variables with inclusion probability above the dashed line remain active in most bootstrap fits. Robust signals include moral neutralization (99%), obsessive passion (97%), past activism (96%), education num (96%), identity fusion (95%), ingroup superiority (87%), anger (73%), age (69%), collective relative deprivation (69%), commitment passion (67%), perceived discrimination (66%).

Effect sizes from the final logistic refit

To translate the sparse LASSO solution into interpretable quantities, violent intent scores above the floor category are treated as an “elevated” outcome. The selected predictors are median-imputed, standardised to one standard deviation, and refit with a plain logistic regression so that each coefficient reflects a 1 SD shift in its source variable.

Code

if (length(selected_terms) == 0) {
  stop('No predictors met the stability threshold; rerun the model-building stage.')
}

effect_df <- criminal_df %>%
  mutate(
    violent_intent_event = factor(
      if_else(violent_intent > 1, 'Elevated intent', 'Baseline intent'),
      levels = c('Baseline intent', 'Elevated intent')
    )
  ) %>%
  select(violent_intent_event, all_of(selected_terms))

effect_recipe <- recipe(violent_intent_event ~ ., data = effect_df) %>%
  step_impute_median(all_predictors()) %>%
  step_normalize(all_predictors())

effect_prep <- prep(effect_recipe)
effect_data <- bake(effect_prep, new_data = NULL)

glm_fit <- glm(
  violent_intent_event ~ .,
  data = effect_data,
  family = binomial(),
  control = list(maxit = 50)
)

tidy_or <- tidy(glm_fit, exponentiate = TRUE, conf.int = TRUE) %>%
  filter(term != '(Intercept)') %>%
  transmute(
    term_raw = term,
    term_label = str_replace_all(term, '_', ' '),
    odds_ratio = estimate,
    or_low = conf.low,
    or_high = conf.high,
    p_value = p.value
  )

ame_tbl <- marginaleffects::avg_slopes(
  glm_fit,
  variables = selected_terms,
  type = 'response'
) %>%
  mutate(
    term_raw = term,
    ame_pp = estimate * 100,
    ame_low = conf.low * 100,
    ame_high = conf.high * 100
  ) %>%
  select(term_raw, ame_pp, ame_low, ame_high)

effect_summary <- tidy_or %>%
  left_join(ame_tbl, by = 'term_raw') %>%
  mutate(term_label = fct_reorder(term_label, odds_ratio))

prediction_df <- effect_data %>%
  mutate(
    .pred = predict(glm_fit, type = 'response'),
    truth = violent_intent_event
  )

roc_tbl <- yardstick::roc_curve(prediction_df, truth, .pred, event_level = 'second')
roc_auc_value <- yardstick::roc_auc(prediction_df, truth, .pred, event_level = 'second')$.estimate
brier_value <- prediction_df %>%
  mutate(truth_num = as.numeric(truth == 'Elevated intent')) %>%
  summarise(brier = mean((truth_num - .pred)^2)) %>%
  pull(brier)

calibration_df <- prediction_df %>%
  mutate(bin = ntile(.pred, 6)) %>%
  group_by(bin) %>%
  summarise(
    mean_pred = mean(.pred),
    emp_rate = mean(truth == 'Elevated intent'),
    .groups = 'drop'
  )

n_total <- nrow(effect_data)
n_positive <- sum(effect_data$violent_intent_event == 'Elevated intent')
event_rate <- n_positive / n_total

top_risk <- effect_summary %>% arrange(desc(odds_ratio)) %>% slice_head(n = 1)
top_protective <- effect_summary %>% arrange(odds_ratio) %>% slice_head(n = 1)
top_ame <- effect_summary %>% arrange(desc(ame_pp)) %>% slice_head(n = 1)
strongest_drop <- effect_summary %>% arrange(ame_pp) %>% slice_head(n = 1)

top_risk_label <- as.character(top_risk$term_label)
top_risk_or <- top_risk$odds_ratio

top_protective_label <- as.character(top_protective$term_label)
top_protective_or <- top_protective$odds_ratio

top_ame_label <- as.character(top_ame$term_label)
top_ame_value <- top_ame$ame_pp

strongest_drop_label <- as.character(strongest_drop$term_label)
strongest_drop_value <- strongest_drop$ame_pp

8 of 80 detainees (10.0%) exhibit elevated violent intent, so the odds ratios still carry considerable uncertainty despite the simplified refit.

ingroup superiority remains the sharpest risk signal with an odds ratio of 27.71, whereas past activism leans protective (OR = 0.06).

Average marginal effects translate these shifts into roughly 9.6 percentage points of additional risk for a 1 SD increase in ingroup superiority, contrasted with a -8.3 pp dip for past activism.

Code

effect_summary %>%
  arrange(desc(odds_ratio)) %>%
  mutate(
    `Odds ratio (95% CI)` = str_c(
      number(odds_ratio, accuracy = 0.01),
      ' (',
      number(or_low, accuracy = 0.01),
      ' – ',
      number(or_high, accuracy = 0.01),
      ')'
    ),
    `AME (pp, 95% CI)` = str_c(
      number(ame_pp, accuracy = 0.1),
      ' (',
      number(ame_low, accuracy = 0.1),
      ' – ',
      number(ame_high, accuracy = 0.1),
      ')'
    ),
    `p-value` = pvalue(p_value, accuracy = 0.001)
  ) %>%
  select(
    Predictor = term_label,
    `Odds ratio (95% CI)`,
    `AME (pp, 95% CI)`,
    `p-value`
  ) %>%
  gt() %>%
  fmt_missing(columns = everything(), missing_text = '—') %>%
  tab_source_note(source_note = gt::md('Predictors are scaled to one standard deviation after median imputation. AME expresses the average percentage-point change in the probability of elevated intent.'))

Predictor	Odds ratio (95% CI)	AME (pp, 95% CI)	p-value
ingroup superiority	27.71 (1.77 – 5 892.84)	9.6 (-1.6 – 20.8)	0.073
obsessive passion	8.34 (0.88 – 355.11)	6.1 (-1.9 – 14.1)	0.129
age	2.48 (0.35 – 22.97)	2.6 (-2.7 – 7.9)	0.333
anger	2.37 (0.27 – 77.96)	2.5 (-4.6 – 9.6)	0.479
moral neutralization	1.26 (0.29 – 4.96)	0.7 (-3.1 – 4.5)	0.736
education num	0.59 (0.03 – 2.55)	-1.5 (-6.7 – 3.6)	0.556
perceived discrimination	0.58 (0.04 – 3.16)	-1.6 (-7.2 – 4.0)	0.579
commitment passion	0.51 (0.01 – 9.55)	-1.9 (-10.8 – 6.9)	0.666
identity fusion	0.49 (0.01 – 4.99)	-2.0 (-9.9 – 5.8)	0.604
collective relative deprivation	0.33 (0.02 – 3.01)	-3.2 (-10.3 – 3.8)	0.358
past activism	0.06 (0.00 – 0.75)	-8.3 (-18.2 – 1.6)	0.088
Predictors are scaled to one standard deviation after median imputation. AME expresses the average percentage-point change in the probability of elevated intent.

Code

effect_summary %>%
  ggplot(aes(x = odds_ratio, y = term_label)) +
  geom_point(colour = '#0d47a1', size = 3) +
  geom_errorbarh(aes(xmin = or_low, xmax = or_high), height = 0.2, colour = '#1565c0') +
  geom_vline(xintercept = 1, linetype = 'dashed', colour = '#757575') +
  scale_x_log10(labels = number_format(accuracy = 0.1)) +
  labs(
    x = 'Odds ratio (log scale)',
    y = NULL,
    title = 'Refitted logistic odds ratios',
    subtitle = 'Per 1 SD increase in each predictor'
  ) +
  theme_minimal()

Code

effect_summary %>%
  ggplot(aes(x = ame_pp, y = term_label)) +
  geom_point(colour = '#00796b', size = 3) +
  geom_errorbarh(aes(xmin = ame_low, xmax = ame_high), height = 0.2, colour = '#004d40') +
  geom_vline(xintercept = 0, linetype = 'dashed', colour = '#757575') +
  labs(
    x = 'Average marginal effect (percentage points)',
    y = NULL,
    title = 'Change in predicted probability of elevated intent'
  ) +
  theme_minimal()

Model performance & calibration

The refitted logistic attains an apparent AUC of 0.99 with a Brier score of 0.033. These values reflect in-sample performance; cross-validated optimism is already controlled upstream by the LASSO selection.

Code

prediction_df %>%
  mutate(truth = fct_relevel(truth, 'Elevated intent')) %>%
  ggplot(aes(x = .pred, y = truth, fill = truth)) +
  geom_density_ridges(alpha = 0.75, colour = NA, scale = 1.05) +
  scale_x_continuous(labels = percent_format(accuracy = 1), limits = c(0, 1)) +
  scale_fill_manual(values = c('Baseline intent' = '#90caf9', 'Elevated intent' = '#1e88e5')) +
  guides(fill = 'none') +
  labs(
    x = 'Predicted probability of elevated intent',
    y = NULL,
    title = 'Distribution of predicted probabilities by observed outcome'
  ) +
  theme_minimal()

Code

roc_tbl %>%
  mutate(false_positive_rate = 1 - specificity) %>%
  ggplot(aes(x = false_positive_rate, y = sensitivity)) +
  geom_line(colour = '#1e88e5', linewidth = 1) +
  geom_abline(slope = 1, intercept = 0, colour = '#9e9e9e', linetype = 'dashed') +
  coord_equal() +
  labs(
    x = 'False positive rate',
    y = 'True positive rate',
    title = 'ROC curve for the refitted model'
  ) +
  theme_minimal()

Code

calibration_df %>%
  ggplot(aes(x = mean_pred, y = emp_rate)) +
  geom_point(size = 2.5, colour = '#00796b') +
  geom_line(colour = '#004d40') +
  geom_abline(slope = 1, intercept = 0, linetype = 'dashed', colour = '#9e9e9e') +
  scale_x_continuous(labels = percent_format(accuracy = 1), limits = c(0, 1)) +
  scale_y_continuous(labels = percent_format(accuracy = 1), limits = c(0, 1)) +
  labs(
    x = 'Average predicted probability (bin mean)',
    y = 'Observed share of elevated intent',
    title = 'Calibration across probability sextiles'
  ) +
  theme_minimal()

The probability densities show clear separation between detainees with elevated intent and those at baseline, though the positive class remains extremely sparse. Calibration stays close to the diagonal for low-risk bins and becomes volatile once predictions exceed 20%, mirroring the limited number of elevated cases.

Interpretation and implications

Dominant risk marker: ingroup superiority increases the odds of elevated violent intent roughly 27.71-fold, corresponding to an average lift of 9.6 percentage points for a one SD increase.
Protective tendency: past activism delivers the largest downward shift (-8.3 pp) and an odds ratio below one, signaling a potential buffer against violent intent.
Practical takeaway: Even with shrinkage and a pared-down refit, only 8 detainees display elevated intent; any intervention should focus on consolidating the high-risk attitudinal profile while acknowledging the wide bands around point estimates.

Limitations

Small event count: Logistic coefficients are estimated from 8 elevated cases, so confidence intervals remain broad and sensitive to single observations.
Standardised interpretation: Odds ratios describe a one SD increase after median imputation; translating back to raw scale requires domain-specific knowledge of each measure.
In-sample evaluation: Performance diagnostics are apparent measures; external validation or temporal splits would be required to confirm transportability beyond this cohort.