
Annotate pairwise comparisons using an LLM
annotate_comparisons.RdSubmits each comparison pair to an LLM and records which document "wins" on the latent dimension of interest.
Usage
annotate_comparisons(
comparisons,
instructions = NULL,
prompt = "A: {{text_a}}\nB: {{text_b}}",
model = "gpt-5.4-mini",
system_prompt = NULL,
allow_ties = TRUE,
path = "textscale_annotations.json",
parallel = FALSE,
cache = NULL,
...
)Arguments
- comparisons
A tibble produced by
generate_comparisons().- instructions
A plain-language description of the comparison task (e.g.,
"Which text is more conservative?"). Prepended tosystem_promptso the LLM knows what to judge. When usingannotate_comparisons()on its own, this is typically the only argument you need beyondcomparisons.- prompt
An
ellmer::interpolate()template string for formatting each document pair. Defaults to"A: {{text_a}}\\nB: {{text_b}}". Override this only if you need a non-standard document layout.- model
Character string naming the OpenAI model to use. Defaults to
"gpt-5.4-mini".- system_prompt
System prompt sent to the LLM. When
NULL(the default), an appropriate prompt is generated based on theallow_tiesargument. Supply a custom value to override this behaviour entirely. Wheninstructionsis supplied, it is prepended tosystem_prompt.- allow_ties
Logical. If
TRUE(the default), the LLM may respond with"tie"when the two texts are indistinguishable on the dimension of interest. Ties are recorded in thewinnercolumn and automatically dropped by downstream functions (fit_model(),validate_model()). IfFALSE, the system prompt instructs the LLM to choose A or B with no ties allowed. Ignored when a customsystem_promptis supplied.- path
File path for checkpointing batch API calls (passed to
ellmer::batch_chat_text()). Defaults to"textscale_annotations.json"in the current working directory. Deleted automatically once the batch completes and results are written tocache. Set toNULLto disable checkpointing. Ignored whenparallel = TRUE.- parallel
Logical. If
FALSE(the default), annotations are submitted via the OpenAI Batch API at 50% of standard prices. Set toTRUEto useellmer::parallel_chat_text()for immediate results at standard prices.- cache
Optional path to an
.rdsfile. If the file exists, cached annotations are matched to the incomingcomparisonsbytext_aandtext_b. Rows whose text pair is found in the cache reuse the storedwinner; only rows without a cache hit are sent to the LLM. The updated result (cached + new) is written back tocacheafter annotation. IfcacheisNULLall rows are annotated and nothing is written to disk. Caches written by textscale carry a prompt hash and will be ignored if the prompt changes; externally supplied caches (no hash attribute) bypass this check.- ...
Additional arguments passed to
ellmer::batch_chat_text()orellmer::parallel_chat_text().
Value
The input comparisons tibble with an additional winner
column containing "A", "B", or (when allow_ties = TRUE)
"tie" for each pair.
Details
The simplest way to use this function is to supply instructions
describing the comparison task in plain language (e.g.,
"Which text is more conservative?"). The instructions are
automatically prepended to the system prompt, and each LLM turn
contains the two documents formatted as
"A: <text_a>\\nB: <text_b>".
For advanced use, the prompt argument is an
ellmer::interpolate() template string that may reference any
column in comparisons with {{column_name}} syntax (most
commonly {{text_a}} and {{text_b}}).
By default, annotations are submitted via the OpenAI Batch API
(ellmer::batch_chat_text()), which is 50% cheaper than standard
pricing but may take up to 24 hours to complete. The path argument
checkpoints batch progress to a .json file so interrupted jobs can
be resumed without re-calling the API. Set parallel = TRUE to use
ellmer::parallel_chat_text() instead, which returns results
immediately at standard API prices.
Before sending requests to the API, the function prints an estimated cost based on approximate token counts (1 token ≈ 4 characters) and published OpenAI prices. The estimate assumes one output token per comparison (single-letter response). Prices may be out of date; see https://openai.com/api/pricing/ for current rates.