Validate a textscale model on held-out comparisons

Evaluates how well a fitted textscale model predicts the outcome of held-out pairwise comparisons. For each pair, the document with the higher latent score is predicted to win; this is compared against the observed winner label.

Usage

validate_model(model, comparisons, embeddings, force = FALSE)

# S3 method for class 'textscale_validation'
plot(x, bins = 10, ...)

Arguments

model: A textscale_model object produced by fit_model().
comparisons: An annotated comparisons tibble produced by annotate_comparisons(). If a split column is present, only test-set rows are used.
embeddings: A numeric matrix of document embeddings. Row i must correspond to document i in the original documents vector passed to generate_comparisons().
force: Logical. If FALSE (the default), the function stops when accuracy is below 0.55 or ICI exceeds 0.20, and warns when accuracy is below 0.65 or ICI exceeds 0.10. Set force = TRUE to downgrade stops to warnings and continue anyway. When calling via textscale(), pass force = TRUE there instead.
x: A textscale_validation object.
bins: Number of equal-width bins for the reliability diagram points. Default is 10.
...: Ignored.

Value

A textscale_validation object. Call print() to display the accuracy and ICI metrics; call plot() to display a calibration plot and retrieve the underlying ggplot object.

plot() returns the ggplot calibration plot invisibly.

Details

The Integrated Calibration Index (ICI) is the mean absolute deviation of the calibration smooth from the diagonal, evaluated over a fine grid. It expresses the average miscalibration in probability units (e.g. 0.05 = predicted probabilities are off by ~5 percentage points on average). Values below ~0.03 are typical of well-calibrated models; values above ~0.07 suggest the model's predicted probabilities should be interpreted cautiously, though the rank ordering of document scores may still be valid.

When comparisons contains a split column (produced by generate_comparisons() with prop supplied), only the rows where split == "test" are evaluated. embeddings should be the full document embedding matrix in this case; doc_id_a and doc_id_b index into it directly.