textscale measures latent quantities from text — things like ideological tone, argument persuasiveness, or ad negativity — by combining pairwise comparisons with text embeddings. You provide a collection of documents and a question (e.g., “Which ad is more negative?”). An LLM annotates a sample of document pairs. textscale then fits a ridge logistic regression on embedding differences to identify the latent dimension, and uses it to score any document — including ones never directly compared.
Installation
# install.packages("pak")
pak::pak("joeornstein/textscale")textscale uses ellmer for LLM calls and fuzzylink for embeddings. Both require an OpenAI API key.
API Key Setup
To install your key in .Renviron for use across sessions, run:
fuzzylink::openai_api_key("your-key-here", install = TRUE)
readRenviron("~/.Renviron") # reload so the key is available immediatelyYou can get a key at platform.openai.com. To verify it’s set: Sys.getenv("OPENAI_API_KEY").
Usage
library(textscale)
result <- textscale(
documents = docs,
prompt = "Which political ad is more negative toward its opponent?",
seed = 42
)
#> textscale result
#> Documents scored: 500
#> Validation: 85.0% accuracy on 5,000 test pairs (ICI = 0.042)
# Document scores with 95% confidence intervals
result$scores
# Calibration plot
plot(result)See the Measuring Political Ad Tone vignette for a worked example using the Carlson & Montgomery (2017) Wisconsin ads dataset.
textscale() handles the full pipeline: generating pairwise comparisons, retrieving embeddings, annotating pairs via the OpenAI Batch API, fitting and validating a model on a held-out test split, refitting on all comparisons, and returning scores for every document.
The individual pipeline steps are also exported if you need finer control:
| Function | Purpose |
|---|---|
generate_comparisons() |
Create train/test comparison pairs |
get_embeddings() |
Retrieve text embeddings |
annotate_comparisons() |
Annotate pairs with an LLM |
fit_model() |
Fit or refit the model |
validate_model() |
Evaluate accuracy on held-out test pairs |
score_documents() |
Score documents on the latent dimension |
How it works
For each annotated pair, textscale computes the difference between the two document embeddings and labels it 1 if document A won and 0 if B won. A ridge logistic regression fit on these differences identifies a direction in embedding space that best separates winners from losers — that direction is the latent dimension. Projecting any document’s embedding onto this direction gives its score, making it straightforward to scale new documents without any additional LLM calls.
