
Validate a textscale model on held-out comparisons
validate_model.RdEvaluates how well a fitted textscale model predicts the outcome of
held-out pairwise comparisons. For each pair, the document with the
higher latent score is predicted to win; this is compared against the
observed winner label.
Usage
validate_model(model, comparisons, embeddings, force = FALSE)
# S3 method for class 'textscale_validation'
plot(x, bins = 10, ...)Arguments
- model
A
textscale_modelobject produced byfit_model().- comparisons
An annotated comparisons tibble produced by
annotate_comparisons(). If asplitcolumn is present, only test-set rows are used.- embeddings
A numeric matrix of document embeddings. Row
imust correspond to documentiin the originaldocumentsvector passed togenerate_comparisons().- force
Logical. If
FALSE(the default), the function stops when accuracy is below 0.55 or ICI exceeds 0.20, and warns when accuracy is below 0.65 or ICI exceeds 0.10. Setforce = TRUEto downgrade stops to warnings and continue anyway. When calling viatextscale(), passforce = TRUEthere instead.- x
A
textscale_validationobject.- bins
Number of equal-width bins for the reliability diagram points. Default is 10.
- ...
Ignored.
Value
A textscale_validation object. Call print() to display
the accuracy and ICI metrics; call plot() to display a calibration
plot and retrieve the underlying ggplot object.
plot() returns the ggplot calibration plot invisibly.
Details
The Integrated Calibration Index (ICI) is the mean absolute deviation of the calibration smooth from the diagonal, evaluated over a fine grid. It expresses the average miscalibration in probability units (e.g. 0.05 = predicted probabilities are off by ~5 percentage points on average). Values below ~0.03 are typical of well-calibrated models; values above ~0.07 suggest the model's predicted probabilities should be interpreted cautiously, though the rank ordering of document scores may still be valid.
When comparisons contains a split column (produced by
generate_comparisons() with prop supplied), only the
rows where split == "test" are evaluated. embeddings should be
the full document embedding matrix in this case; doc_id_a and
doc_id_b index into it directly.