Two Volumes: the lessons of Time on the Cross

(This is a talk from a January 2019 panel at the annual meeting of the American Historical Association. You probably need to know, to read it, that the MLA conference was simultaneously taking place about 20 blocks north.)

Disciplinary lessons

The panel that became this conversation started when John Theibault tweeted that digital history doesn't see Fogel and Engerman's Time on the Cross as part of its core history. I participated because I–and some of the grad students I've taught–remembered that I did make them reckon with Time on the Cross as part of the genealogy of digital history, early in each semester. I taught it because I had found myself navigating around it for years; generally students took it as a negative example to avoid, and someone would always show up stunned to learn that Fogel won a Nobel prize and a Bancroft prize. (Probably, to be honest, because many of them skipped the book to read Thomas Haskell's devastating 1975 New York Review of Books summation. It's worth your time, too, if you for some reason want to read this piece, but aren't quite clear on what Time on the Cross was.)

Time on the Cross was not just another history book. One of the more interesting phenomena was that students would order the book from Amazon–there are still a lot of cheap copies of Time on the Cross out there–and every once in a while they would accidentally end up with a copy not of the narrative but of the second volume that gives the methodological apparatus for the book. There are a lot of problems with Time on the Cross, and many are much more important than this. But for me, the most interesting has to do with the thinking that made this a reasonable arrangement–how did the argument and the evidence came to be so heavily separated from each other?

This was a problem high in everyone's mind in the 1970s as well: Thomas Haskell pointed out the division as one of the works methodological mortal sins: “Most readers of Time on the Cross see only the silk purse of apparent scientific exactitude; the authors spared them the sight of the sow's ear from which it all came.”

Time on the cross ratified a split between the “humanities” history and social science history–especially economics–that mirrors the split of the book itself into two volumes.

The reception of Time on the Cross has focused on its myopia. Jessica Marie Johnson recently published an essay in Social Text that gives a good account of this position–this is the critize that I find most important for graduate students in the humanities to completely internalize, because it articulates two of the most important ways that the humanities have developed a language for talking about statistics.

“Statistics on their own, enticing in their seeming neutrality, failed to address or unpack black life hidden behind the archetypes, caricatures, and nameless numbered registers of human property slave owners had left behind. And cliometricians failed to remove emotion from the discussion. Data without an accompanying humanistic analysis—an exploration of the world of the enslaved from their own perspective—served to further obscure the social and political realities of black diasporic life under slavery.”

Johnson, Jessica Marie. “Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads.” Social Text 36, no. 4.

The argument is not just that data fails to capture experience, but that the existence of data is itself part of the record of violence. To quote again: ‘Data is the evidence of terror, and the idea of data as fundamental and objective information, as Fogel and Engerman found, obscures rather than reveals the scene of the crime.’

This critique is a powerful one, and fits in with currents of humanistic scholarship going back to Thomas Haskell and Herbert Gutmann's initial critiques of Time on the Cross in the face of its initial publicity blitz. (A blitz, it's worth noting, that is entirely typical of the course of computationalist approaches to history and literature, from Busa to the cliometricians to the “culturomics” project where I did a fellowship at the start of the decade.

If I can just reflect myself on what I learned from this in graduate school before I'd ever heard of “digital humanities,” it was that an attention to statistical reductionism rather than human experience struck in the face of historical practice but also served to ratify past crimes. To act statistically on individual subjects violates a kind of taboo. To this day, I'm generally really reluctant to make data visualizations in which the points are individual people. In my own work–and in my advice to any other digital historians–I generally think that human beings are among the least promising topics of statistical analysis; I visualize ships, books, and land, but try to avoid visualizing the person whenever possible. After all, people are the things we need data to understand the least. In a graduate class with the political historian Sean Wilentz, I remember vividly being walked through a slate of regression analyses of ethnic voting patterns in pre-Civil War US counties, feeling that I was learning something, and then–in the dialectical style he loves to use–being informed that none of these methods–not one!–can tell us a single thing about why a single person voted the way they did.

I knew something about data analysis, but was warned off and kept quiet; for three years after generals, the only contribution I made with code to the Princeton History department was a monte-carlo simulation program to optimize batting orders for the intensely competitive summer softball league. (And even there, I used it more for the grad student team, The Great Bat Massacre; the faculty heavy Revolting Masses had firmer hierarchies in play.)

When I got into digital history in 2010, this was my sense of Fogel and Engerman; as a still-radioactive site in the discipline, smoking from the wars of the 1970s, where you tread at your peril.

Economists’ lessons

Something that I didn't know until much more recently was that at the same time, economists were learning an entirely different set of lessons. As the history of capitalism turns to slavery in the past few years, particularly with controversies around books by Sven Beckert and Ed Baptist, we've seen economists look with bemusement at historians as the species that failed to learn the lessons of Time on the Cross. This is a set of debates largely parallel to Digital History, proper; but one crucial to understanding the future of digital history, writ small.

The economist Eric Hilt points to the existence of an extensive literature about slavery, worrying that historians “do not seem to have taken seriously the debates among economic historians that followed the publication of that book.”

More importantly, the lack of engagement with economic historians limited the analytical perspectives of each of these books. Most of them seem aware of Fogel and Engerman’s Time on the Cross (1974), and some repeat its arguments about the profitability of slavery or the efficiency of slave plantations. But they do not seem to have taken seriously the debates among economic historians that followed the publication of that book. Some […] challenged Fogel and Engerman[; but] analyzed slavery in new ways.

(Hilt, Eric. “Economic History, Historical Analysis, and the ‘New History of Capitalism.’” The Journal of Economic History 77, no. 2 (June 2017).)

Hilt inclines towards–to me–a really surprising account: that historians are even recapitulating the core ideological mistake of Fogel and Engerman by cloaking the work of enslaved people in a bizzarely emancipatory rhetoric, as if developing new techniques for picking cotton quickly is an accomplishment anyone should be eager to claim.

I don't think this criticism is entirely fair, but it's at least within fair grounds. Alan Olmstead raises the disciplinary stakes. The problem is not just that historians haven't read it, but that the work was simply “beyond the comprehension of many historians.”

In the past, historians and economists (sometimes working as a team) collectively advanced the understanding of slavery, southern development, and capitalism. There was a stimulating dialog. That intellectual exchange deteriorated in part because some economists produced increasingly technical work that was sometimes beyond the comprehension of many historians. Some historians were offended by some economists who overly flaunted their findings and methodologies.

Olmstead, Alan L., and Paul W. Rhode. “Cotton, Slavery, and the New History of Capitalism.” Explorations in Economic History 67 (January 1, 2018).

I love this an apology–I feel like marriage counselors could use this as a prime example of how to exacerbate differences in denying them. It boils down to “I'm sorry for being so sophisticated, and for letting you know it.” But any good marriage counselor could also probably elicit out of this an acknowledgement. I miss your company; I want you to affirm my work; I want to be in your life. And where Olmstead doesn't quite acknowledge what he wants from historians, other economists do.

Should we just hope for consilience?

The obvious takeaway is that we simply need to get these fields talking to each other again and some proliferation of magic will ensue.

The economist Trevon Logan gives a good synopsis of this account in a series of Tweets I'll quote in excerpt describing how he teaches Time on the Cross and the economics of slavery today. This is a more expert account than I can give: but suffice it to say that for historians, the moment of great split is obviously Time on the Cross itself. Fogel's later, better work, including Without Consent or Contract, is outside the conversation. While historians were avoiding the radioactive spot, economists were loading into their hazmat suits and wandering in.

Some Tweets from Trevon Logan, 2018

  • I begin the second week by talking about where we are at. TOTC is bad. It’s being attacked and the integrity of the authors is being questioned. So F&E regroup, and they cut their losses, and they go back to the beginning. They go back to the productivity calculation.
  • F&E win on prices, output, production constraints, insurance, crop mix, etc. It’s a battle of the force and in the end their original calculation is established and likely accepted by the majority of the field. (Whether it’s the right calculation is another question.)
  • But this vindication of efficiency came with a great cost. The audience for economic history shrank dramatically. The knowledge we have is for 1860- far too static. The new history of capitalism ignores it. Economic historians agree on efficiency, but there’s more to know
  • There is little attempt among Economic historians now to make contributions that historians will pay attention to. We’re much more comfortable talking about data and methodology than history. This is bad for the field.
  • To read the efficiency debate as opposed to the TOTC debate is to see two parts of economic history. One is concerned with historiography and changing the methods and topics in history, and the other is about data and measurement. The latter is modern day economic history
  • There is also a racial dimension that continues to have an influence. F&E the racial aspects wrong. Very wrong. But they attempted to divorce the racial aspects from the economics, and we cannot.
  • Trevon Logan, Twitter

The other thing pointing towards the possibility of this turn is that a few blocks over is a discipline–English–that has managed a heavily quantitative turn in the last few years. Work in digital literary studies has taken on a much more aggressively turn towards measurement; flagship journals regularly publish work using topic modeling or word embeddings, and new journals like Cultural Analytics lean most heavily on an English-language base.

For the most part, these types of articles in literature continue to have a lot more throat clearing about the nature of reading and evidence than I think a historical journal should publish. (The nature of reading itself is not an object of study for us to the same degree). But we are also seeing long-term attempts to construct time series, bibliographies, and reprint lists that enable a large array of quantitative evidence to be brought to bear on questions like–in a recent paper by Underwood, Bamman, and Lee–what the gender breakdown of English-language fiction looks like.

Computational history is dead for good: a provocation.

A few years ago I was still quite hopeful about this possibility. For the historians of capitalism, it will surely happen in some form; but not, I hope, as part of a new science of history. I now think that the success in English and Literature is actually the exception that proves the rule. Even back in the 2000s period of humanities computing, it was already the case that English had a lot more number crunching and history a lot more public-facing web sites. Today, that's even more the case. And the reason for that, in part, is that there is considerably less low-hanging fruit in terms of data-driven argumentation than in literary study. In literary history and art history, there are huge arrays of digitized data and an array of fairly amateurish physicists and computer scientists who desparately need advice.

For historians, the situation is very different. In questions of historical interest, the post-time-on-the-cross split means that there have been generations of training making economists, sociologists, and political scientists both methodologically competent and substantively knowledgeable. And because of the closed-off nature of the way that these fields have tended to handle their data, it tends to be accessible but for use cases that map much more closely to social science experimentation than to humanistic narration.

I reached my most pessimistic state about this a few months ago when I saw a paper by economists tagged with the subfield I was trained in, intellectual history. It bears the slightly overwrought title “Ideas have consequences.” But what it does is present a a really powerful account of the transmission of ideas across social networks through textual analysis–something I've seen physicists, and genomicists, and literary historians, and intellectual historians take stabs at for years with underwhelming results.

It makes a powerful and discrete argument about the way that the privately-funded Manne seminars in law and economics seminars–which were attended by a substantial proportion of the federal judiciary–affected the language, decisions, and sentencing of federal justices who attended them robust to a wide variety of covariates. (Of course, I don't know what covariates it's not robust to.) And even more interestingly, they claim to have an effect where simply being randomly impaneled with a judge who attended one of these seminars will make that second judge harsher in her future sentencing decisions.

Reading this paper was exciting, but looking through the tools and tricks and sources also made me feel like someone in a science fiction movie encountering an artifact from the future.

Ash, Chen, and Naidu 2018\

The extraordinary quality of data is something that is hard for us to get–they have not just a million or so circuit court votes and 300,000 opinions, but also the institutional capacity to file FOIA requests to get the exact years of attendance for every judge who went to the program.

We supplemented this list with exact years of attendance from Annual Reports obtained by filing FOIA requests and correspondence from the Law and Economics Center at George Mason University. Figure 1 plots the share of Circuit Court cases with a Manne Judge on the panel over time. As can be seen, by the late nineties, about half of cases were directly impacted by a Manne panelist.

And they have the disciplinary capacity to do things like casually use relatively new methods like word embeddings without spending pages slowly their audience through just what they are or what it might mean to use them, and gently analogizing them. (Imagine a field where you can describe a computational method without having to first identify which Borges short story–the map of the empire, the analytical language of John Wilkins, Pierre Menard and the Quijote–it most closely resembles. The words we could save!)

This paper utilizes a dataset on all 380,000 cases (over a million judge votes) in Circuit Courts for 1891-2013, and a data set on one million criminal sentencing decisions in U.S. District Courts linked to judge identity (via FOIA request) for 1992-2011. We have detailed information on the judges and the metadata associated with the cases. In addition, we process the text of the written opinions to represent judge writing as a vector of phrase frequencies.

In the United States, computational history is dead because this happens at a much higher level among the social scientists than any historian can possibly pick up in a graduate program. (Things are different in Europe, I should note: even back in graduate school doing a field about Germany with Harold James, I saw election statistics being deployed far more impressively in the German-language historical/geography tradition than in what comes out the Anglosphere. This continues in work by people like Melvin Webers on the continent.)

The best work in the cliometric tradition followed in the path of Social History. For a while I assigned, alongside Time on the Cross, Steven Ruggles’ work on the changing American family structure. (Some students called it out for too obviously balancing the “good” cliometrics against the bad.) But while Ruggles remains a tremendously important historian, it's not hard to see how the work at the Minnesota Population Center that he leads has hewed closer and closer the sociological mainstream than the historical one. If Computational History really existed, it would need a conference; and the first challenge it would face would be justifying its existence in the face of the enormous work coming out of the Social Science History Association. Before we try to reboot computational history, we should look and see how our discipline has played as part of that larger structure.

Digital history is reproducibility

So–should we despair because the job is gone? No–but I think that it reflects some argument about what path we should stay on. We've seen a firmer set of turns lately towards an insistence on argumentation as the centerpiece of digital history, from Cameron Blevins in Debates in the Digital Humanities, and a whole slew of initiatives out of George Mason, which was one of the leaders of the old form of digital humanities, including their new journal devoted to argumentation in history. It might seem reasonable to position this new manifestation of digital history as the place where cliometric flaunting can be tempered. If there's anyone left to be offended in my claim that computational history is dead, it would be the brilliant Lincoln Mullen at GMU, who has done a better job than anyone (as in his AHR article with Kellen Funk on legal text reuse) to bring cutting-edge computational methods to mainstream historians.

And there is some ground to occupy there. I'll be assigning the Ash, Chen & Naidu article, and not Time on the Cross, in the digital history seminar I'm teaching next semester, and I can tell you what the student reactions are going to be; that it's an impressive argument, but that the bulk of the 50 pages are incomprehensible econometrical posturing that don't allow for any real conversation. And if I have some Americanists, they'll point that the thing which is interesting here–the purchase of the federal judiciary by right-wing funding networks–is something we already know anyway from a variety of works by real historians of the period, like Nancy MacLean's Democracy in Chains. Moreover, they will find it entirely bloodless. To talk about Florida big-money seminars in law and economics dangles the spectacle of a young Stephen Breyer and Ruth Bader Ginsburg enjoying Mai Tais on a Florida beach with Milton Friedman, and then hits you with pages on pages of difference-in-difference analysis. There aren't two volumes here, but there's an argumentative punch to the abstract that isn't continuously rewarded while reading the paper; the argument and the analysis are distinct. [Spoiler alert from December 2019: I underestimated just how hard it would be to even pull the essence out and not become overwhelmed by the graphs.]

The path that digital historians built in the decades of the 1990s and 2000s while avoiding the shadow of cliometrics was a far more interesting one; unlike English-department digital humanities of the period, it had no motive to make history more ‘scientific,’ and instead found ways to make historical practice live on computers and–increasingly–online.

911digitalarchive.org

The most successful of these were efforts around digital public history, where historians found ways to bring materials into an online setting that were not amenable to books. Omeka, the public history CMS, has many flaws, but is still for my money the most important and irreplaceable project out of the profession in the last twenty years. And–we can fight about this up on the Magnificent Mile later this week–it certainly is more important than anything the literary studies folks have produced.

So what I really think is that the division of books problem–the division of methodology and narrative–is one that is precisely adressable through these same kinds of challenges, because historians as much as anyone outside of digital journalists have been thinking about audience, narrative, and publics.

Yesterday, the New York Times put up an article about the Municipal Archive's set of photographs of every building in New York City from the 1940s.

I cannot really recommend this article–they seem to have chosen a few buildings largely at random, they don't give a strong sense of the city, and their narrative of change is largely that things are taller. But it points to a proliferation of sources, especially those not controlled by Proquest, Gale, Cengage, and so on. in the last decade that hasn't been reckoned in the profession.

link to data

Humanistic reproducibility

So let me finish by framing this problem as reproducibility. The sciences have been plagued with crises of reproducibility and deluged with schemes for solving it. The literary and library scholars in DH (overmuch, to my mind) see one of their challenges as fully fixing those problems in digital humanities before they emerge through a tangle of IPython notebooks, online linked open data, and Docker configuration files.

There are of course, crises of historical reproducibility as well–Time on the Cross may have seen controversy, but it was allowed to keep its Bancroft prize. Say what you will about it, but there's another book to win the Bancroft prize that was so much worse it had to be revoked. It's interesting that we don't think of Arming America as having the same deep legacy to reckon with–perhaps we could.

Arming American

But one reason that archival historical narratives work is that the historical narrative is itself an artifact of reproducible research; you read some claims at the front, but it's only in the process of reading a book that you become conscious of its individual narrative flow.

The thing that we need to think more more about to shape narratives around historical data that admit of reproduction by actual historians, not econometricians; how do we make that persuasive flow work for everyone?

These are tools that we start to see emerging in digital journalism, often around the explication of social scientific research–for instance, in this New York Times piece addressing social mobility.

They're also becoming increasingly common in the areas of computer science, like Google's Distill

And this is a capability that digital humanities projects can continue to make; where reproduction is about array and exposure to multivalent primary sources, allowing readers to engage and change the assumptions of models. This is harder than just distributing models; it's about working with sources in the indefinitely reconfigurable ways that are now possible.

We see this happening, already, even in the still-vibrant digital historiography of slavery itself. Ed Baptist himself is working on a digital project, Slavery on the Move. There are multiple works on lynching documentation. Caleb McDaniel (on this panel) has a bot called “every five minutes” that tweets (as it says) every five some snippet of a record of a slave sale. I've actually blocked that one on Twitter myself, because the juxtaposition of reminders it produces are sometimes more than I can take in the middle of the day.

Moreover, we continue to see a wide variety of work that isn't about data, per se, expanding the notion of what digitally oriented scholarship can be. If I had to single out a single institution today, I'd point to the University of Richmond, with work like the masterful American Panorama atlas edited by Robert Nelson or Lauren Tilton and Taylor Arnold's work on projects like http://photogrammar.yale.edu about already-digitized photos at the Library of Congress.

This kind of attention to audience, to re-ordering, and to narrative engagement evolved in part because of the weight of Time on the Cross. This–not warmed over introductory econometrics–is the real contribution to intellectual life that digital humanities stands to make. And its one that historians, with their multiple sources and strong subfield of public history, are better positioned to execute than any other field in Digital Humanities. It often doesn't handle data tables by the narrow definition of Time on the Cross; but by the very virtue of its difference, it breaks out of the mold of separating argument and evidence that does work well for interdisciplinary work or for engaging a larger public, and could provide some of the answers we need for moving forward.

Avatar
Ben Schmidt
Director of Digital Humanities and Clinical Associate Professor History

I am a digital historian and Director of Digital Humanities at NYU.