Download PDF

Due: Over Blackboard, March 20 at 5:00pm.

Note: starting point RMarkdown files for this assignment are located at You should use these as a starting point: feel free to copy any code here without attribution, altering it as needed to suit your assignment.

In this class, for instance, we only had time to read a few chapters each from the biographies of Henry Adams and Malcolm X; we learned a bit about these two important historical figures, but not nearly as much as is printed in just two books. And we read English and Underwood on scale and reading, discussing how to read at scale. How could we start to look at a larger set of biographies? Even if we only wanted to study biographies with some portion set in Boston, we would need to navigate a rather large corpus!

For this assignment, we will build on that in-class work and “not read” the rest of these biographies. The point of this assignment will be to play, to experiment. The goal of this assignment is not, ultimately, to reconstruct a missing story, but to “read” books in a fundamentally new way, and to think through the intellectual implications of doing so. Here are the steps you should follow:

  1. Make Some Predictions. Choose either Adams or X for sustained analysis. Based on the chapters of these books we read in class, what do you think the rest of the books will be about? Who will be the major characters? What will be the important settings? What central themes or ideas will the book explore? Are there any stylistic characteristics you expect to continue? Be sure to write your ideas down in a document you can refer back to later.

  2. Read Bios Into R. Read the .txt files for The Education of Henry Adams or The Autobiography of Malcolm X into R. (You can download Malcolm X’s autobiography from here

  3. Generate word clouds. How you might “read” this? Come up with a few different observations. What kinds of words are there? Are there patterns or inconsistencies in the words, or in what is relatively more or less frequent?

  4. Compare word frequencies. Does focused attention to word frequency change your opinions about your book? Which words are most common, and what might they say about the book’s substance? What about scarce or infrequent words? What keywords or patterns do you observe?

  5. Explore Ngrams. Move from word frequencies to 2-, 3-, 4-, or more ngrams. What do these pairs and phrases help you understand about the major themes and ideas of the book? Compare these with the same pairs or phrases in Google’s Ngram Viewer, which displays the frequency of worlds over time by drawing on the massive Google Books corpus. Look at the frequency of your biographies’ prominent Ngrams through time, paying particular attention to their frequency at the times the biography was published.
    • Do any of them stand out, either as particularly common words during their time or, perhaps as interestingly, as particularly uncommon words during their time?
    • Try a few more words from the frequency lists you generated earlier. Then, try comparing some of the keywords from your chosen work with some keywords from your key work—do any interesting comparisons emerge?
    • The big question here: can a tool like the Ngrams viewer, which analyzes so many texts, help you understand anything about the historical place of a book you have never (entirely) read?
  6. Generate topic models or sentiment plots. Using the stm and sentiment packages, generate a topic model for your chosen biography. (Example code for this is at the course web site. What topics does it draw from the most heavily, and what words constitutes those topics. Can you discern anything from these computationally-derived topics that helps you understand the book’s themes or ideas (more colloquial “topics,” if you will). You might not get all of these to work perfectly; if you can’t, that’s OK.

  7. Break it down. Try breaking the book into chapters or sections. Think about what lower-level structures make the most sense. Try many of the above steps for individual sections to see what you can regenerate from a “distant” perspective. Particularly, compare the sections we read in class against the sections we did not.

  8. Compare and contrast. In the steps above, you tried “distant reading” using a book you were at least a little bit familiar with—having read a few chapters probably made the output of these computational techniques sensible. Now you should choose another Boston life story—which you haven’t read at all—from the list below. Go through all the steps above using this new text, working to glean what you can about its contents without glancing at a single line (at least a line as printed in the codex itself) from the actual book. How does your purely computational understanding of this new book compare and contrast with your understanding of a book you read partially as a narrative and partially through computational methods?

  9. Write a Reflection. Finally, you will write a report (in R Markdown) describing what you did and what you learned. Include five or six code blocks you generated in the above efforts in the flow of the text–these may generate graphs (using ggplot2) or tables that you can refer. Please keep the emphasis on what you learned: a) about your chosen text, b.) about the strengths and weaknesses of this kind of “distant reading.” We’re interested in your speculations, your thoughtful reflections on text analysis. Grades will be based on how thoughtfully you engage with the assignment and how clearly those thoughts are expressed in prose. You do not need a central argument (although it is fine if you have one.) The goal of this assignment is to ruminate on what kinds of knowledge a distant reading can or cannot produce. In other words, it encourages you to think about how textual analysis changes our attention to texts. A good paper can have lots of unanswered questions. Good questions are evidence of thoughtfulness. The text (non-code) portions of your paper should be in the range of 1000 to 2000 words.

  10. Compile a subset of code, your reflection, and submit. Compile your R markdown reflection (with code), and export as a .docx or .html file. Submit the file over Blackboard. It may take a little while to get the code to run–make sure it begins with blocks that load in appropriate libraries and data. Make sure all images are included.

See the “Handouts” section of the website for some more guidance on possible texts.

The idea for this assignment was adapted from our colleague Paul Fyfe of North Carolina State University. Prof. Fyfe describes his version of the assignment in “How Not to Read a Victorian Novel, Journal of Victorian Culture 16, no. 1 (April 2011).


See point 9, above. Late assignments will be penalized a third of a grade for each day that they are late.