A course on analyzing political texts using the R
programming language
This site is intended to serve as a companion to Grimmer, Stewart, and Roberts (2021), an excellent book on how to think about text as data, which makes a deliberate choice to omit code when describing their examples.1 Thus the need for this R
code supplement, which was developed during my Summer 2022 graduate-level Text As Data course at the University of Georgia. All the code and data necessary to replicate the results on this site are available at the GitHub link on the upper right.
The site is divided into three sections, corresponding to the three stages of any text-as-data workflow:
For each stage in the workflow, there are a number of useful R
packages that can help accomplish these tasks, including webscraping (rvest
), optical character recognition (tesseract
), tidying (tidytext
), topic modeling (topicmodels
), sentiment analysis (sentimentR
), and many others. On this site, we will walk through several tutorials of these packages â motivated by political science applications â with links to more detailed documentation for those interested in exploring further.
Wisely, in my view, as books with code can quickly become dated.âŠī¸