Criminal Sample Predictors via LASSO

Identifying key drivers of violent intent in the jihadi detainee data

Why a LASSO analysis?

The original random forest analysis supplied broad signals about violent intent across the wider sample. However, the criminal extremist subset is small (≈ 80 cases) and potentially idiosyncratic. LASSO regression is well-suited here because it performs feature selection and shrinkage simultaneously, guarding against overfitting in high-dimensional, low-(N) settings. The goal is to surface a compact, interpretable set of predictors that remain active when the model is tuned with cross-validation.

How does LASSO regression work?

LASSO regression (Least Absolute Shrinkage and Selection Operator) is a variant of linear regression that combines prediction with variable selection. Instead of only fitting the best line to the data, it adds a penalty that discourages including too many predictors. This penalty is controlled by a parameter, usually called λ (lambda).

  • Ordinary regression: estimates coefficients by minimizing the difference between predicted and observed values.
  • LASSO regression: does the same, but adds a penalty proportional to the absolute size of the coefficients. The optimization problem looks like this: \[\text{Loss} = \text{Error} + \lambda \sum |\beta_j|\] where \(\beta_j\) are the regression coefficients.

The effect of this penalty is that:

  • For small λ, the penalty is weak: most predictors are kept, similar to standard regression.
  • For large λ, the penalty is strong: many coefficients shrink exactly to zero, effectively removing those predictors from the model.

This means that lasso not only estimates relationships, but also performs automatic variable selection. Predictors with no meaningful contribution are excluded, leaving a simpler and more interpretable model.

How λ is chosen

Because the number of variables in the final model depends on λ, we cannot set it arbitrarily. Instead, we use cross-validation:

  1. Split the dataset into folds.
  2. Fit the model across a grid of possible λ values.
  3. Evaluate prediction accuracy (e.g., using root mean square error).
  4. Select the λ that gives the best performance on unseen folds.

The final model is then re-fit using this optimal λ. At that penalty level, only the predictors with non-zero coefficients remain active.

Why this matters for our analysis

  • Our dataset is small but has many potential predictors. Using all of them risks overfitting.
  • Lasso automatically identifies which variables contribute most reliably to predicting violent intent.
  • The result is a compact set of predictors that balances parsimony (few variables) with accuracy (good predictions).

Data preparation

Code
load("../data/wrangled_data/dt_ana_base_jih.RData")

criminal_raw <- dt_ana_base_jih %>%
  clean_names() %>%
  mutate(
    past_activism = ifelse(is.nan(past_activism), NA_real_, past_activism),
    across(where(is.factor), as.character)
  )

criminal_df <- criminal_raw %>%
  select(
    -response_id,
    -gender, # all men
    -education,
    -religion, # all muslim
    -country,
    -peaceful_intent, # alternative DV
    -radical_attitudes, # too close to main DV
    -activist_intent,  # fully missing in this sample
    -intuition,  # fully missing in this sample
    -logic  # fully missing in this sample
  ) %>%
  mutate(across(everything(), as.numeric)) %>%
  select(where(~ !all(is.na(.)))) %>%
  drop_na(violent_intent)

n_cases <- nrow(criminal_df)
n_predictors <- ncol(criminal_df) - 1
list(
  observations = n_cases,
  predictors = n_predictors
)
$observations
[1] 80

$predictors
[1] 14

The cleaned data contain 80 observations with 14 candidate predictors for violent intent. Despite the small sample, there is still a rich set of attitudinal and emotional measures to explore.

Missingness overview

Code
missing_summary <- criminal_df %>%
  summarise(across(everything(), ~ sum(is.na(.)))) %>%
  pivot_longer(cols = everything(), names_to = "variable", values_to = "missing_n") %>%
  mutate(
    missing_pct = missing_n / n_cases
  ) %>%
  arrange(desc(missing_pct))

missing_summary %>%
  mutate(missing_pct = percent(missing_pct, accuracy = 0.1)) %>%
  gt() %>%
  fmt_number(columns = missing_n, decimals = 0) %>%
  tab_header(title = "Missing data pattern in the criminal sample")
Missing data pattern in the criminal sample
variable missing_n missing_pct
past_activism 11 13.8%
collective_relative_deprivation 10 12.5%
education_num 6 7.5%
identity_fusion 5 6.2%
age 2 2.5%
moral_neutralization 0 0.0%
anger 0 0.0%
positive_affect 0 0.0%
negative_affect 0 0.0%
perceived_discrimination 0 0.0%
ingroup_superiority 0 0.0%
harmonious_passion 0 0.0%
obsessive_passion 0 0.0%
commitment_passion 0 0.0%
violent_intent 0 0.0%

Variables such as collective relative deprivation and identity fusion still exhibit gaps, but median imputation (embedded in the modeling recipe below) keeps them in play without discarding cases.

Distributions and descriptive statistics

Code
stat_df <- criminal_df %>%
  mutate(across(everything(), ~ ifelse(is.na(.), median(., na.rm = TRUE), .)))

p_vi <- ggplot(stat_df, aes(x = violent_intent)) +
  geom_histogram(bins = 12, fill = "#2c7fb8", colour = "white") +
  geom_density(aes(y = after_stat(count)), colour = "#08306b", linewidth = 0.9, alpha = 0.4) +
  labs(x = "Violent intent", y = "Count", title = "Distribution of violent intent") +
  theme_minimal()

p_vi

Violent intent is markedly skewed to the floor with a long tail at the high end, underscoring why regularization is critical: only a handful of detainees report elevated willingness to use violence.

Code
basic_stats <- stat_df %>%
  summarise(across(everything(), list(
    mean = ~ mean(.x),
    sd = ~ sd(.x),
    min = ~ min(.x),
    max = ~ max(.x)
  ), .names = "{.col}__{.fn}")) %>%
  pivot_longer(everything(), names_to = c("variable", "moment"), values_to = "value", names_sep = "__") %>%
  pivot_wider(names_from = moment, values_from = value) %>%
  arrange(desc(mean))

basic_stats %>%
  mutate(across(mean:max, ~ round(.x, 2))) %>%
  gt() %>%
  tab_header(title = "Moment statistics (median-imputed)")
Moment statistics (median-imputed)
variable mean sd min max
age 36.36 10.57 20 63.00
commitment_passion 4.41 2.16 1 7.00
positive_affect 3.54 0.91 1 5.00
anger 3.52 1.43 1 6.00
harmonious_passion 3.48 2.02 1 7.00
identity_fusion 3.25 2.92 1 7.00
perceived_discrimination 3.18 2.13 1 7.00
collective_relative_deprivation 2.76 2.14 1 7.00
negative_affect 2.46 1.01 1 5.00
ingroup_superiority 2.46 2.09 1 7.00
obsessive_passion 1.84 1.47 1 7.00
education_num 1.83 1.03 1 5.00
moral_neutralization 1.59 0.97 1 4.88
violent_intent 1.48 1.60 1 8.00
past_activism 0.09 0.24 0 1.00

Zero-order signal with violent intent

Code
cor_summary <- stat_df %>%
  relocate(violent_intent) %>%
  summarise(across(-violent_intent, ~ cor(.x, violent_intent))) %>%
  pivot_longer(everything(), names_to = "variable", values_to = "correlation") %>%
  mutate(variable = fct_reorder(variable, correlation))

cor_summary %>%
  ggplot(aes(x = correlation, y = variable, fill = correlation > 0)) +
  geom_col(width = 0.6) +
  scale_fill_manual(values = c("TRUE" = "#ef6c00", "FALSE" = "#1565c0"), guide = "none") +
  labs(x = "Pearson correlation", y = NULL, title = "Bivariate association with violent intent") +
  theme_minimal()

Activist passion measures, perceived discrimination, and ingroup superiority already show positive simple correlations with violent intent, but multicollinearity and noise require a penalised model to adjudicate which survive joint estimation.

Modeling strategy

We adopt a tidy modeling workflow with five-fold cross-validation repeated 20 times (100 resamples total) to stabilise the penalty search. Predictors are median-imputed and standardized within each resample so that the LASSO penalty treats them on a comparable scale.

Code
set.seed(20241105)

lasso_recipe <- recipe(violent_intent ~ ., data = criminal_df) %>%
  step_impute_median(all_predictors()) %>%
  step_normalize(all_predictors())

lasso_spec <- linear_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet")

lasso_workflow <- workflow() %>%
  add_model(lasso_spec) %>%
  add_recipe(lasso_recipe)

folds <- vfold_cv(criminal_df, v = 5, repeats = 20)
penalty_grid <- grid_regular(penalty(range = c(-6, 1)), levels = 50)

lasso_res <- tune_grid(
  lasso_workflow,
  resamples = folds,
  grid = penalty_grid,
  metrics = metric_set(rmse, rsq)
)

cv_metrics <- collect_metrics(lasso_res)
cv_metrics %>%
  filter(.metric == "rmse") %>%
  arrange(mean) %>%
  slice_head(n = 5)
# A tibble: 5 × 7
  penalty .metric .estimator  mean     n std_err .config         
    <dbl> <chr>   <chr>      <dbl> <int>   <dbl> <chr>           
1  0.0518 rmse    standard    1.18   100  0.0353 pre0_mod34_post0
2  0.0373 rmse    standard    1.18   100  0.0340 pre0_mod33_post0
3  0.0720 rmse    standard    1.18   100  0.0369 pre0_mod35_post0
4  0.0268 rmse    standard    1.19   100  0.0330 pre0_mod32_post0
5  0.1    rmse    standard    1.19   100  0.0386 pre0_mod36_post0

The table displays the five penalties with the lowest cross-validated root mean square error (RMSE).

Code
best_rmse <- select_best(lasso_res, metric = "rmse")
rmse_metrics <- cv_metrics %>% filter(.metric == "rmse")

rmse_metrics %>%
  ggplot(aes(x = penalty, y = mean)) +
  geom_line(colour = "#37474f") +
  geom_point(colour = "#455a64") +
  geom_point(
    data = rmse_metrics %>% filter(penalty == best_rmse$penalty),
    inherit.aes = FALSE,
    aes(x = penalty, y = mean),
    colour = "#d32f2f",
    size = 3
  ) +
  geom_ribbon(aes(ymin = mean - std_err, ymax = mean + std_err), alpha = 0.15, fill = "#607d8b") +
  scale_x_log10(labels = label_number(accuracy = 0.001)) +
  labs(
    x = "LASSO penalty (log scale)",
    y = "Cross-validated RMSE",
    title = "Penalty selection profile"
  ) +
  theme_minimal()

The optimal penalty is modest, indicating a model that retains a handful of predictors while still shrinking minor contributors toward zero.

Final model and coefficient profile

Code
final_workflow <- finalize_workflow(lasso_workflow, best_rmse)
final_fit <- final_workflow %>% fit(data = criminal_df)

nonzero_coefs <- final_fit %>%
  extract_fit_parsnip() %>%
  tidy(return_zeros = TRUE) %>%
  filter(term != "(Intercept)", abs(estimate) > 0) %>%
  mutate(
    term_raw = term,
    term_label = str_replace_all(term_raw, "_", " "),
    abs_estimate = abs(estimate),
    direction = if_else(estimate >= 0, "Risk factor", "Protective factor")
  ) %>%
  arrange(desc(abs_estimate))

nonzero_coefs %>%
  mutate(rank = row_number()) %>%
  select(rank, term = term_label, estimate, direction) %>%
  mutate(estimate = round(estimate, 3)) %>%
  gt() %>%
  tab_header(title = "Active coefficients in the tuned LASSO model")
Active coefficients in the tuned LASSO model
rank term estimate direction
1 obsessive passion 0.590 Risk factor
2 moral neutralization 0.580 Risk factor
3 past activism -0.270 Protective factor
4 ingroup superiority 0.269 Risk factor
5 identity fusion 0.255 Risk factor
6 education num -0.220 Protective factor
7 commitment passion -0.103 Protective factor
8 anger -0.086 Protective factor
9 negative affect -0.068 Protective factor
10 age 0.063 Risk factor
11 collective relative deprivation -0.029 Protective factor
12 perceived discrimination 0.017 Risk factor
Code
nonzero_coefs %>%
  mutate(term_label = fct_reorder(term_label, estimate)) %>%
  ggplot(aes(x = estimate, y = term_label, fill = direction)) +
  geom_col(width = 0.65) +
  scale_fill_manual(values = c("Risk factor" = "#c62828", "Protective factor" = "#2e7d32")) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "#424242") +
  labs(
    x = "Standardised coefficient",
    y = NULL,
    title = "Relative weight of retained predictors"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

The coefficients represent the effect of a one standard deviation change in each predictor after median imputation. Positive estimates correspond to elevated violent intent, whereas negative values indicate attenuation.

Selection stability

Code
set.seed(20241105)
stability_boot_n <- getOption("lasso_stability_boot", 500)

stability_samples <- bootstraps(criminal_df, times = stability_boot_n)

stability_results <- stability_samples %>%
  mutate(
    estimates = map(
      splits,
      ~ final_workflow %>%
        fit(data = analysis(.x)) %>%
        extract_fit_parsnip() %>%
        tidy(return_zeros = TRUE) %>%
        filter(term != "(Intercept)") %>%
        transmute(
          term_raw = term,
          estimate = estimate,
          selected = abs(estimate) > 1e-6
        )
    )
  ) %>%
  select(estimates) %>%
  unnest(estimates) %>%
  group_by(term_raw) %>%
  summarise(
    inclusion_prob = mean(selected),
    avg_coef = mean(estimate),
    sd_coef = sd(estimate),
    .groups = "drop"
  ) %>%
  mutate(
    term_label = str_replace_all(term_raw, "_", " ")
  ) %>%
  arrange(desc(inclusion_prob))

stability_cutoff <- 0.6

selected_terms <- stability_results %>%
  filter(inclusion_prob >= stability_cutoff) %>%
  pull(term_raw)

if (length(selected_terms) == 0) {
  selected_terms <- nonzero_coefs$term_raw
}

selection_sentence <- stability_results %>%
  filter(term_raw %in% selected_terms) %>%
  mutate(label = str_c(term_label, " (", percent(inclusion_prob, accuracy = 1), ")")) %>%
  arrange(desc(inclusion_prob)) %>%
  pull(label) %>%
  str_c(collapse = ", ")

if (length(selection_sentence) == 0) {
  selection_sentence <- ""
}

selection_summary <- stability_results
Code
stability_results %>%
  mutate(term_label = fct_reorder(term_label, inclusion_prob)) %>%
  ggplot(aes(x = inclusion_prob, y = term_label)) +
  geom_col(fill = "#4a90e2") +
  geom_vline(xintercept = stability_cutoff, linetype = "dashed", colour = "#b71c1c") +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  labs(
    x = "Bootstrap inclusion probability",
    y = NULL,
    title = "Selection stability across 500 bootstrap replicates"
  ) +
  theme_minimal()

Variables with inclusion probability above the dashed line remain active in most bootstrap fits. Robust signals include moral neutralization (99%), obsessive passion (97%), past activism (96%), education num (96%), identity fusion (95%), ingroup superiority (87%), anger (73%), age (69%), collective relative deprivation (69%), commitment passion (67%), perceived discrimination (66%).

Effect sizes from the final logistic refit

To translate the sparse LASSO solution into interpretable quantities, violent intent scores above the floor category are treated as an “elevated” outcome. The selected predictors are median-imputed, standardised to one standard deviation, and refit with a plain logistic regression so that each coefficient reflects a 1 SD shift in its source variable.

Code
if (length(selected_terms) == 0) {
  stop('No predictors met the stability threshold; rerun the model-building stage.')
}

effect_df <- criminal_df %>%
  mutate(
    violent_intent_event = factor(
      if_else(violent_intent > 1, 'Elevated intent', 'Baseline intent'),
      levels = c('Baseline intent', 'Elevated intent')
    )
  ) %>%
  select(violent_intent_event, all_of(selected_terms))

effect_recipe <- recipe(violent_intent_event ~ ., data = effect_df) %>%
  step_impute_median(all_predictors()) %>%
  step_normalize(all_predictors())

effect_prep <- prep(effect_recipe)
effect_data <- bake(effect_prep, new_data = NULL)

glm_fit <- glm(
  violent_intent_event ~ .,
  data = effect_data,
  family = binomial(),
  control = list(maxit = 50)
)

tidy_or <- tidy(glm_fit, exponentiate = TRUE, conf.int = TRUE) %>%
  filter(term != '(Intercept)') %>%
  transmute(
    term_raw = term,
    term_label = str_replace_all(term, '_', ' '),
    odds_ratio = estimate,
    or_low = conf.low,
    or_high = conf.high,
    p_value = p.value
  )

ame_tbl <- marginaleffects::avg_slopes(
  glm_fit,
  variables = selected_terms,
  type = 'response'
) %>%
  mutate(
    term_raw = term,
    ame_pp = estimate * 100,
    ame_low = conf.low * 100,
    ame_high = conf.high * 100
  ) %>%
  select(term_raw, ame_pp, ame_low, ame_high)

effect_summary <- tidy_or %>%
  left_join(ame_tbl, by = 'term_raw') %>%
  mutate(term_label = fct_reorder(term_label, odds_ratio))

prediction_df <- effect_data %>%
  mutate(
    .pred = predict(glm_fit, type = 'response'),
    truth = violent_intent_event
  )

roc_tbl <- yardstick::roc_curve(prediction_df, truth, .pred, event_level = 'second')
roc_auc_value <- yardstick::roc_auc(prediction_df, truth, .pred, event_level = 'second')$.estimate
brier_value <- prediction_df %>%
  mutate(truth_num = as.numeric(truth == 'Elevated intent')) %>%
  summarise(brier = mean((truth_num - .pred)^2)) %>%
  pull(brier)

calibration_df <- prediction_df %>%
  mutate(bin = ntile(.pred, 6)) %>%
  group_by(bin) %>%
  summarise(
    mean_pred = mean(.pred),
    emp_rate = mean(truth == 'Elevated intent'),
    .groups = 'drop'
  )

n_total <- nrow(effect_data)
n_positive <- sum(effect_data$violent_intent_event == 'Elevated intent')
event_rate <- n_positive / n_total

top_risk <- effect_summary %>% arrange(desc(odds_ratio)) %>% slice_head(n = 1)
top_protective <- effect_summary %>% arrange(odds_ratio) %>% slice_head(n = 1)
top_ame <- effect_summary %>% arrange(desc(ame_pp)) %>% slice_head(n = 1)
strongest_drop <- effect_summary %>% arrange(ame_pp) %>% slice_head(n = 1)

top_risk_label <- as.character(top_risk$term_label)
top_risk_or <- top_risk$odds_ratio

top_protective_label <- as.character(top_protective$term_label)
top_protective_or <- top_protective$odds_ratio

top_ame_label <- as.character(top_ame$term_label)
top_ame_value <- top_ame$ame_pp

strongest_drop_label <- as.character(strongest_drop$term_label)
strongest_drop_value <- strongest_drop$ame_pp

8 of 80 detainees (10.0%) exhibit elevated violent intent, so the odds ratios still carry considerable uncertainty despite the simplified refit.

ingroup superiority remains the sharpest risk signal with an odds ratio of 27.71, whereas past activism leans protective (OR = 0.06).

Average marginal effects translate these shifts into roughly 9.6 percentage points of additional risk for a 1 SD increase in ingroup superiority, contrasted with a -8.3 pp dip for past activism.

Code
effect_summary %>%
  arrange(desc(odds_ratio)) %>%
  mutate(
    `Odds ratio (95% CI)` = str_c(
      number(odds_ratio, accuracy = 0.01),
      ' (',
      number(or_low, accuracy = 0.01),
      ' – ',
      number(or_high, accuracy = 0.01),
      ')'
    ),
    `AME (pp, 95% CI)` = str_c(
      number(ame_pp, accuracy = 0.1),
      ' (',
      number(ame_low, accuracy = 0.1),
      ' – ',
      number(ame_high, accuracy = 0.1),
      ')'
    ),
    `p-value` = pvalue(p_value, accuracy = 0.001)
  ) %>%
  select(
    Predictor = term_label,
    `Odds ratio (95% CI)`,
    `AME (pp, 95% CI)`,
    `p-value`
  ) %>%
  gt() %>%
  fmt_missing(columns = everything(), missing_text = '—') %>%
  tab_source_note(source_note = gt::md('Predictors are scaled to one standard deviation after median imputation. AME expresses the average percentage-point change in the probability of elevated intent.'))
Predictor Odds ratio (95% CI) AME (pp, 95% CI) p-value
ingroup superiority 27.71 (1.77 – 5 892.84) 9.6 (-1.6 – 20.8) 0.073
obsessive passion 8.34 (0.88 – 355.11) 6.1 (-1.9 – 14.1) 0.129
age 2.48 (0.35 – 22.97) 2.6 (-2.7 – 7.9) 0.333
anger 2.37 (0.27 – 77.96) 2.5 (-4.6 – 9.6) 0.479
moral neutralization 1.26 (0.29 – 4.96) 0.7 (-3.1 – 4.5) 0.736
education num 0.59 (0.03 – 2.55) -1.5 (-6.7 – 3.6) 0.556
perceived discrimination 0.58 (0.04 – 3.16) -1.6 (-7.2 – 4.0) 0.579
commitment passion 0.51 (0.01 – 9.55) -1.9 (-10.8 – 6.9) 0.666
identity fusion 0.49 (0.01 – 4.99) -2.0 (-9.9 – 5.8) 0.604
collective relative deprivation 0.33 (0.02 – 3.01) -3.2 (-10.3 – 3.8) 0.358
past activism 0.06 (0.00 – 0.75) -8.3 (-18.2 – 1.6) 0.088
Predictors are scaled to one standard deviation after median imputation. AME expresses the average percentage-point change in the probability of elevated intent.
Code
effect_summary %>%
  ggplot(aes(x = odds_ratio, y = term_label)) +
  geom_point(colour = '#0d47a1', size = 3) +
  geom_errorbarh(aes(xmin = or_low, xmax = or_high), height = 0.2, colour = '#1565c0') +
  geom_vline(xintercept = 1, linetype = 'dashed', colour = '#757575') +
  scale_x_log10(labels = number_format(accuracy = 0.1)) +
  labs(
    x = 'Odds ratio (log scale)',
    y = NULL,
    title = 'Refitted logistic odds ratios',
    subtitle = 'Per 1 SD increase in each predictor'
  ) +
  theme_minimal()

Code
effect_summary %>%
  ggplot(aes(x = ame_pp, y = term_label)) +
  geom_point(colour = '#00796b', size = 3) +
  geom_errorbarh(aes(xmin = ame_low, xmax = ame_high), height = 0.2, colour = '#004d40') +
  geom_vline(xintercept = 0, linetype = 'dashed', colour = '#757575') +
  labs(
    x = 'Average marginal effect (percentage points)',
    y = NULL,
    title = 'Change in predicted probability of elevated intent'
  ) +
  theme_minimal()

Model performance & calibration

The refitted logistic attains an apparent AUC of 0.99 with a Brier score of 0.033. These values reflect in-sample performance; cross-validated optimism is already controlled upstream by the LASSO selection.

Code
prediction_df %>%
  mutate(truth = fct_relevel(truth, 'Elevated intent')) %>%
  ggplot(aes(x = .pred, y = truth, fill = truth)) +
  geom_density_ridges(alpha = 0.75, colour = NA, scale = 1.05) +
  scale_x_continuous(labels = percent_format(accuracy = 1), limits = c(0, 1)) +
  scale_fill_manual(values = c('Baseline intent' = '#90caf9', 'Elevated intent' = '#1e88e5')) +
  guides(fill = 'none') +
  labs(
    x = 'Predicted probability of elevated intent',
    y = NULL,
    title = 'Distribution of predicted probabilities by observed outcome'
  ) +
  theme_minimal()

Code
roc_tbl %>%
  mutate(false_positive_rate = 1 - specificity) %>%
  ggplot(aes(x = false_positive_rate, y = sensitivity)) +
  geom_line(colour = '#1e88e5', linewidth = 1) +
  geom_abline(slope = 1, intercept = 0, colour = '#9e9e9e', linetype = 'dashed') +
  coord_equal() +
  labs(
    x = 'False positive rate',
    y = 'True positive rate',
    title = 'ROC curve for the refitted model'
  ) +
  theme_minimal()

Code
calibration_df %>%
  ggplot(aes(x = mean_pred, y = emp_rate)) +
  geom_point(size = 2.5, colour = '#00796b') +
  geom_line(colour = '#004d40') +
  geom_abline(slope = 1, intercept = 0, linetype = 'dashed', colour = '#9e9e9e') +
  scale_x_continuous(labels = percent_format(accuracy = 1), limits = c(0, 1)) +
  scale_y_continuous(labels = percent_format(accuracy = 1), limits = c(0, 1)) +
  labs(
    x = 'Average predicted probability (bin mean)',
    y = 'Observed share of elevated intent',
    title = 'Calibration across probability sextiles'
  ) +
  theme_minimal()

The probability densities show clear separation between detainees with elevated intent and those at baseline, though the positive class remains extremely sparse. Calibration stays close to the diagonal for low-risk bins and becomes volatile once predictions exceed 20%, mirroring the limited number of elevated cases.

Interpretation and implications

  • Dominant risk marker: ingroup superiority increases the odds of elevated violent intent roughly 27.71-fold, corresponding to an average lift of 9.6 percentage points for a one SD increase.
  • Protective tendency: past activism delivers the largest downward shift (-8.3 pp) and an odds ratio below one, signaling a potential buffer against violent intent.
  • Practical takeaway: Even with shrinkage and a pared-down refit, only 8 detainees display elevated intent; any intervention should focus on consolidating the high-risk attitudinal profile while acknowledging the wide bands around point estimates.

Limitations

  • Small event count: Logistic coefficients are estimated from 8 elevated cases, so confidence intervals remain broad and sensitive to single observations.
  • Standardised interpretation: Odds ratios describe a one SD increase after median imputation; translating back to raw scale requires domain-specific knowledge of each measure.
  • In-sample evaluation: Performance diagnostics are apparent measures; external validation or temporal splits would be required to confirm transportability beyond this cohort.